What is the behavior when using the following construct (latest version of Spring). I'm unable to find it in the documentation.
#RabbitListener(queues = {"q1", "q2", "q3"})
public class MyListener {
In which order are the messages from the 3 queues processed?
It is indeterminate - 3 basicConsume operations are performed on the consumer channel (if you increase concurrentConsumers it's 3 per consumer). The basicConsume operations are normally performed in the order the queues are defined (in all cases unless one or more of the queues is temporarily "missing").
The broker will send messages from each queue up to the prefetchCount (basicQos) for each queue (default 1).
I don't know the actual algorithm used by the broker in this scenario but you should assume it to be indeterminate - Spring AMQP will deliver them to the listener(s) in the order received from the broker.
EDIT
I just ran a test (2 queues each with 2 existing messages) and they were delivered round-robin - q1m1, q2m1, q1m2, q2m2 when the prefetch was 1.
With prefetch set to 4, I see q1m1, q1m2, q2m1, q2m2.
Of course, when the queues are empty, messages will generally arrive in the order they arrive at the broker.
EDIT2
See Consumer Prefetch.
Spring AMQP uses the basicQos variant with no global arg, so the default (false) is used. That means the prefetch is per-consumer.
Related
in the RabbitMQ specification there can be found:
Section 4.7 of the AMQP 0-9-1 core specification explains the conditions under which ordering is guaranteed: messages published in one channel, passing through one exchange and one queue and one outgoing channel will be received in the same order that they were sent. RabbitMQ offers stronger guarantees since release 2.7.0.
but what if there is binding which goes like Exchange 1 -> Exchange 2 -> Queue 1.
Is the ordering still guaranteed?
We assumed it did but we found in our application that it might not be the case. We use spring-rabbit-2.1.6-RELEASE (which uses amqp-client-5.4.3).
The publishers, binding and consumers are following:
Client 1 publishes to Exchange 1 -> Exchange 2 -> Queue 1 - consumed by Client 2
-> Queue 2 - consumed by Client 3
We can see that Client 1 publishes 3 messages in following order:
Message 1
Message 2
Message 3
But the both Client 2 and Client 3 receive the messages in following order:
Message 3
Message 1
Message 2
EDIT 1 (Spring configuration)
For the publisher (Client 1) there is following XML configuration used (no extra properties set on rabbit's ConnectionFactory):
<rabbit:connection-factory channel-cache-size="1" cache-mode="CHANNEL" id="respConnFactory" addresses="..." virtual-host="..." username="..." password="..." executor="connExec"/>
<!-- the executor has no meaning for such usingas mentioned by Gary -->
The publishing is done via:
AmqpTemplate::send(String exchange, String routingKey, Message message)
in a dedicated thread.
Client 2 uses default spring configuration with SimpleMessageListenerContainer.
Client 3 isn't actually our application so I don't know the real setup. That was them who reported us a bug that the messages aren't ordered properly.
Of course there is still possibility that we logged the message publishing with some bug. But I triple checked it - it's from a single thread and there is sequence number in each message's custom header which is incremented correctly on Client 1.
EDIT 2
I did further analysis in order to find out how often the wrong message sorting happens. Here are the result:
I took the logs and data +-2 hours around the incident (4 hours in total) and there were 42706 messages sent and only 3 of them had wrong sorting on Client 2. All 3 messages were sent within interval of 7 ms.
Then I randomly took another time window of length 14 hours. There were 531904 messages sent and all of them received by Client 2 in correct order. The average message rate is ~11 messages per second.
The messages aren't distributed evenly so the 3 messages within 7 ms isn't anything especial - quite an opposite. It's common that within 3-5 ms there are multiple messages sent.
From this analysis I assume there was something weird going on on the rabbit cluster. Unfortunately I don't have the logs from it anymore.
The chance of some kind of race condition is from my point of view very low.
Thank you,
Frank
Spring AMQP uses a cache for channels; in a multi-threaded environment, there is no guarantee that the same thread will always use the same channel; hence ordering is not guaranteed.
With the current releases, the solution is to use scoped operations which will guarantee that a series of publications will occur on the same channel and guarantee order.
In the next release (2.3, available later this year), we have also added the ThreadChannelConnectionFactory which does the same thing.
It happened again and we were able to figure it out.
The whole time it was rabbit health indicator who was responsible for channel recreation and therefore for wrong order sorting. There was a job which periodically called the health endpoint.
As Gary correctly mentioned:
Spring AMQP uses a cache for channels; in a multi-threaded environment, there is no guarantee that the same thread will always use the same channel; hence ordering is not guaranteed.
The health status is checked from different thread and it uses the producer's channel.
As short term solution this will work:
management.health.rabbit.enabled=false
The sorting is guaranteed if the producer is really single thread and the connection factory is setup as in the description.
Another (and maybe proper) solution is to create separate ConnectionFactory and don't use the auto-configuration for rabbit health check.
#Bean("rabbitHealthIndicator")
public HealthIndicator rabbitHealthIndicator(ConnectionFactory healthCheckConnectionFactory) {
RabbitTemplate rabbitTemplate = new RabbitTemplate(healthCheckConnectionFactory); // make sure it's a different connection factory than the one with guaranteed sorting
return new RabbitHealthIndicator(rabbitTemplate);
}
That did the trick.
Cheers and thank you Gary for your help.
Frank
I'm implementing a daily job which get data from a MongoDB (around 300K documents) and for each of them publish a message on a RabbitMQ queue.
On the other side I have some consumers on the same queue, which ideally should work in parallel.
Everything is working but not as much as I would, specially regarding consumers performances.
This is how I declare the queue:
rabbitMQ.getChannel().queueDeclare(QUEUE_NAME, true, false, false, null);
This is how the publishing is done:
rabbitMQ.getChannel().basicPublish("", QUEUE_NAME, null, body.getBytes());
So the channel used to declare the queue is used to publish all the messages.
And this is how the consumers are instantiated in a for loop (10 in total, but it can be any number):
Channel channel = rabbitMQ.getConnection().createChannel();
MyConsumer consumer = new MyConsumer(customMapper, channel, subscriptionUpdater);
channel.basicQos(1); // also tried with 0, 10, 100, ...
channel.basicConsume(QUEUE_NAME, false, consumer);
So for each consumer I create a new channel and this is confirmed by logs:
...
com.rabbitmq.client.impl.recovery.AutorecoveringChannel#bdd2027
com.rabbitmq.client.impl.recovery.AutorecoveringChannel#5d1b9c3d
com.rabbitmq.client.impl.recovery.AutorecoveringChannel#49a26d19
...
As far as I've understood from my very short RabbitMQ experience, this should guarantee that all the consumer are called.
By the way, consumers need between 0.5 to 1.2 seconds to complete their task. I have just spotted very few 3 seconds.
I have two separate queues and I repeat what I said above two times (using the same RabbitMQ connection).
So, I have tested publishing 100 messages for each queue. Both of them have 10 consumers with qos=1.
I didn't expect to have exactly a delivery/consume performance of 10/s, instead I noticed:
actual values are around 0.4 and 1.0.
at least all the consumers bound to the queue have received a message, but it doesn't look like "fair dispatching".
it took about 3 mins 30 secs to consume all the messages on both queues.
Am I missing the main concept of threading within RabbitMQ? Or any specific configuration which might be still at default value?
I'm on it since very few days so this might be possible.
Please notice that I'm in the fortunate position where I can control both publishing and consuming parts :)
I'm using RabbitMQ 3.7.3 locally, so it cannot be any network latency issue.
Thanks for your help!
The setup of RabbitMQ channels and consumers were correct in the end: so one channel for each consumer.
The problem was having the consumers calling a synchronized method to find and update a MongoDB document.
This was delaying the execution time of some consumers: even worst, the more consumers I was adding (thinking to speed up processing), the less message rate/s I was getting.
I have moved the MongoDB part on he publishing side where I don't have to care about synchronization because it's done in sequence by one publisher only. I have a slightly decreased delivery rate/s but now with just 5 consumers I easily reach an ack rate of 50-60/s.
Lessons learnt:
create a separate channel for the publisher.
create a separate channel for each consumer.
let RabbitMQ manage threading for the consumers (--> you can instantiate them on the main thread).
(if possible) back off publishing to give the queues 100% time to deal with consumers.
set a qos > 1 for each consumer channel. But this really depends on your scenario and architecture: you must do some performance test.
As a general rule:
(1) calculate/estimate delivery time.
(2) calculate/estimate ack time.
(3) calculate/estimate consumer time.
qos = (1) + (2) + (3) / (3)
This will give you an initial qos value to test and tweak based on your scenario. The final goal is to have 100% utilization for all the available consumers.
I had a question on how rabbitmq works with batching acknowledgements. I understand that the Prefetch value is the max number of messages that will get queued before reaching its limit. However, I wasn't sure if the ack's manage themselves or if I have to manage this in code.
Which method is correct?
Send each basicAck with multiple set to true
or
wait until 10 acks were supposed to be sent out and send only the last one and AMQP will automatically send all previous in queue. (with multiple set to true)
TL;DR multiple = true is faster in some cases but requires a lot more careful book keeping and batch like requirements
The consumer gets messages that have a monotonic-ly growing id specific to that consumer. The id is a 64 bit number (it actually might be an unsigned 32 bit but since Java doesn't have that its a long) called the delivery tag. The prefetch is the most messages a consumer will receive that are unacked.
When you ack the highest delivery tag with multiple true it will acknowledge all the unacked messages with a lower delivery tag (smaller number) that the consumer has outstanding. Obviously if you have high prefetch this is faster than acking each message.
Now RabbitMQ knows the consumer received the messages (the unacked ones) but it doesn't know if all those messages have been correctly consumed. So it is on the burden of you the developer to make sure all the previous messages have been consumed. The consumer will deliver the messages in order (I believe internally the client uses a BlockingQueue) but depending on the library/client used downstream the messages might not be.
Thus this really only works well when you are batching the messages together in a single go (e.g. transaction or sending a group of messages off to some other system) or buffering reliably. Often this is done with a blocking queue and then periodically draining the queue to send a group of messages to a downstream system.
On the other hand if you are streaming each message in real time then you can't really do this (ie multiple = false).
There is also the case of one of the message being bad in the group (e.g. drained from internal queue... not rabbit queue) and you won't to nack that bad one. If that is the case you can't use multiple = true either.
Finally if you wait for a certain amount messages (instead of say time) more than the prefetch you will wait indefinitely.... not a good idea. You need to wait on time and number of messages must be <= prefetch.
As you can see its fairly nontrivial to correctly use multiple = true.
First one correction regarding Prefetch value is the max number of messages that will get queued before reaching its limit. - this is not what prefetch value is; prefetch value is the number of UN-ACKed messages that consumer "gets" from the queue. So they are kind of assigned to the consumer but remain in the queue until they are acknowledged. Quote from here, when prefetch is 1
This tells RabbitMQ not to give more than one message to a worker at a
time. Or, in other words, don't dispatch a new message to a worker
until it has processed and acknowledged the previous one.
And for your question:
I wasn't sure if the ack's manage themselves or if I have to manage
this in code.
You can set the auto ack flag to true and then you could say that the ack's manage themselves
I am using ActiveMQ 5.8 with wildcard consumers configured in camel route.
I am using default ActiveMQ configuration, so I have defaults as below
prefetch = 1
dispatch policy= Round Robin
Now I start a consumer jvm with 5 consumers each for 2 queues. both the queue has same type of message and same number of messages.
Consumers are doing nothing but printing the message (so no db blocking or slow consumer issue)
EDIT
I have set preFetch to 1 for each of the queue
What I observe is one of the queue getting drained faster than other.
What I expect is both the queue getting drained at equal pace, kind of load balance.
One surprising observation is
Though activemq webconsole shows 5 consumers for each of those queues
When I debug my consumer, I see only 5 threads / consumers from camel flow for a wildcard queue *.processQueue
What will be cause of above behavior?
How do I make sure that all the queue drain at equal pace?
Did anyone has experience to share on writting custom dispatch policy or overriding defaults of activemq?
I was able to find a reference to this behavior
Message distribution in case of wildcard queue consumers is random.
http://activemq.2283324.n4.nabble.com/Wildcard-and-message-distribution-td2346132.html#a2346133
Though this can be tuned by setting appropriate prefetch size.
After trial & error, I arrived at following formula, to have fair distribution across the consumers and all the queue getting de-queued at almost same pace.
prefetch = number of wildcard consumers
It's probably wrong to compare the rate the queues are consumed. The load balancing typically happens between consumers. So, the idea is that each of the five consumers on the first queue would get rather even load (given they are connected to the same broker).
However, I think you might want to double check your load test setup. It rarely gives predictable results when running broker and consumers on the same machine for instance.
I'm trying to understand the best way to coalesce or chunk incoming messages in RabbitMQ (using Spring AMQP or the Java client directly).
In other words I would like to take say 100 incoming messages and combine them as 1 and resend it to another queue in a reliable (correctly ACKed way). I believe this is called the aggregator pattern in EIP.
I know Spring Integration provides an aggregator solution but the implementation looks like its not fail safe (that is it looks like it has to ack and consume messages to build the coalesced message thus if you shutdown it down while its doing this you will loose messages?).
I can't comment directly on the Spring Integration library, so I'll speak generally in terms of RabbitMQ.
If you're not 100% convinced by the Spring Integration implementation of the Aggregator and are going to try to implement it yourself then I would recommend avoiding using tx which uses transactions under the hood in RabbitMQ.
Transactions in RabbitMQ are slow and you will definitely suffer performance problems if you're building a high traffic/throughput system.
Rather I would suggest you take a look at Publisher Confirms which is an extension to AMQP implemented in RabbitMQ. Here is an introduction to it when it was new http://www.rabbitmq.com/blog/2011/02/10/introducing-publisher-confirms/.
You will need to tweak the prefetch setting to get the performance right, take a look at http://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/ for some details.
All the above gives you some background to help solve your problem. The implementation is rather straightforward.
When creating your consumer you will need to ensure you set it so that ACK is required.
Dequeue n messages, as you dequeue you will need to make note of the DeliveryTag for each message (this is used to ACK the message)
Aggregate the messages into a new message
Publish the new message
ACK each dequeued message
One thing to note is that if your consumer dies after 3 and before 4 has completed then those messages that weren't ACK'd will be reprocessed when it comes back to life
If you set the <amqp-inbound-channel-adapter/> tx-size attribute to 100, the container will ack every 100 messages so this should prevent message loss.
However, you might want to make the send of the aggregated message (on the 100th receive) transactional so you can confirm the broker has the message before the ack for the inbound messages.