I am using ActiveMQ 5.8 with wildcard consumers configured in camel route.
I am using default ActiveMQ configuration, so I have defaults as below
prefetch = 1
dispatch policy= Round Robin
Now I start a consumer jvm with 5 consumers each for 2 queues. both the queue has same type of message and same number of messages.
Consumers are doing nothing but printing the message (so no db blocking or slow consumer issue)
EDIT
I have set preFetch to 1 for each of the queue
What I observe is one of the queue getting drained faster than other.
What I expect is both the queue getting drained at equal pace, kind of load balance.
One surprising observation is
Though activemq webconsole shows 5 consumers for each of those queues
When I debug my consumer, I see only 5 threads / consumers from camel flow for a wildcard queue *.processQueue
What will be cause of above behavior?
How do I make sure that all the queue drain at equal pace?
Did anyone has experience to share on writting custom dispatch policy or overriding defaults of activemq?
I was able to find a reference to this behavior
Message distribution in case of wildcard queue consumers is random.
http://activemq.2283324.n4.nabble.com/Wildcard-and-message-distribution-td2346132.html#a2346133
Though this can be tuned by setting appropriate prefetch size.
After trial & error, I arrived at following formula, to have fair distribution across the consumers and all the queue getting de-queued at almost same pace.
prefetch = number of wildcard consumers
It's probably wrong to compare the rate the queues are consumed. The load balancing typically happens between consumers. So, the idea is that each of the five consumers on the first queue would get rather even load (given they are connected to the same broker).
However, I think you might want to double check your load test setup. It rarely gives predictable results when running broker and consumers on the same machine for instance.
Related
I'm implementing a daily job which get data from a MongoDB (around 300K documents) and for each of them publish a message on a RabbitMQ queue.
On the other side I have some consumers on the same queue, which ideally should work in parallel.
Everything is working but not as much as I would, specially regarding consumers performances.
This is how I declare the queue:
rabbitMQ.getChannel().queueDeclare(QUEUE_NAME, true, false, false, null);
This is how the publishing is done:
rabbitMQ.getChannel().basicPublish("", QUEUE_NAME, null, body.getBytes());
So the channel used to declare the queue is used to publish all the messages.
And this is how the consumers are instantiated in a for loop (10 in total, but it can be any number):
Channel channel = rabbitMQ.getConnection().createChannel();
MyConsumer consumer = new MyConsumer(customMapper, channel, subscriptionUpdater);
channel.basicQos(1); // also tried with 0, 10, 100, ...
channel.basicConsume(QUEUE_NAME, false, consumer);
So for each consumer I create a new channel and this is confirmed by logs:
...
com.rabbitmq.client.impl.recovery.AutorecoveringChannel#bdd2027
com.rabbitmq.client.impl.recovery.AutorecoveringChannel#5d1b9c3d
com.rabbitmq.client.impl.recovery.AutorecoveringChannel#49a26d19
...
As far as I've understood from my very short RabbitMQ experience, this should guarantee that all the consumer are called.
By the way, consumers need between 0.5 to 1.2 seconds to complete their task. I have just spotted very few 3 seconds.
I have two separate queues and I repeat what I said above two times (using the same RabbitMQ connection).
So, I have tested publishing 100 messages for each queue. Both of them have 10 consumers with qos=1.
I didn't expect to have exactly a delivery/consume performance of 10/s, instead I noticed:
actual values are around 0.4 and 1.0.
at least all the consumers bound to the queue have received a message, but it doesn't look like "fair dispatching".
it took about 3 mins 30 secs to consume all the messages on both queues.
Am I missing the main concept of threading within RabbitMQ? Or any specific configuration which might be still at default value?
I'm on it since very few days so this might be possible.
Please notice that I'm in the fortunate position where I can control both publishing and consuming parts :)
I'm using RabbitMQ 3.7.3 locally, so it cannot be any network latency issue.
Thanks for your help!
The setup of RabbitMQ channels and consumers were correct in the end: so one channel for each consumer.
The problem was having the consumers calling a synchronized method to find and update a MongoDB document.
This was delaying the execution time of some consumers: even worst, the more consumers I was adding (thinking to speed up processing), the less message rate/s I was getting.
I have moved the MongoDB part on he publishing side where I don't have to care about synchronization because it's done in sequence by one publisher only. I have a slightly decreased delivery rate/s but now with just 5 consumers I easily reach an ack rate of 50-60/s.
Lessons learnt:
create a separate channel for the publisher.
create a separate channel for each consumer.
let RabbitMQ manage threading for the consumers (--> you can instantiate them on the main thread).
(if possible) back off publishing to give the queues 100% time to deal with consumers.
set a qos > 1 for each consumer channel. But this really depends on your scenario and architecture: you must do some performance test.
As a general rule:
(1) calculate/estimate delivery time.
(2) calculate/estimate ack time.
(3) calculate/estimate consumer time.
qos = (1) + (2) + (3) / (3)
This will give you an initial qos value to test and tweak based on your scenario. The final goal is to have 100% utilization for all the available consumers.
Sometimes due to some external problems, I need to requeue a message by basic.reject with requeue = true.
But I don't need to consume it immediately because it will possibly fail again in a short time. If I continuously requeue it, this may result in infinite loop and requeue.
So I need to consume it later, say one minute later,
And I need to know how many times the messages has been requeue so that I can stop requeue it but only reject it to declare it fails to consume.
PS: I am using Java client.
There are multiple solutions to point 1.
First one is the one chosen by Celery (a Python producer/consumer library that can use RabbitMQ as broker). Inside your message, add a timestamp at which the task should be executed. When your consumer gets the message, do not ack it and check its timestamp. As soon as the timestamp is reached, the worker can execute the task. (Note that the worker can continue working on other tasks instead of waiting)
This technique has some drawbacks. You have to increase the QoS per channel to an arbitrary value. And if your worker is already working on a long running task, the delayed task wont be executed until the first task has finished.
A second technique is RabbitMQ-only and is much more elegant. It takes advantage of dead-letter exchanges and Messages TTL. You create a new queue which isn't consumed by anybody. This queue has a dead-letter exchange that will forward the messages to the consumer queue. When you want to defer a message, ack it (or reject it without requeue) from the consumer queue and copy the message into the dead-lettered queue with a TTL equal to the delay you want (say one minute later). At (roughly) the end of TTL, the defered message will magically land in the consumer queue again, ready to be consumed. RabbitMQ team has also made the Delayed Message Plugin (this plugin is marked as experimental yet fairly stable and potential suitable for production use as long as the user is aware of its limitations and has serious limitations in term of scalability and reliability in case of failover, so you might decide whether you really want to use it in production, or if you prefer to stick to the manual way, limited to one TTL per queue).
Point 2. just requires putting a counter in your message and handling this inside your app. You can choose to put this counter in a header or directly in the body.
We're using ActiveMQ (5.14.5).
We have a single producer, and multiple consumers on the same queue.
From time to time we set JMSXGroupID to group several messages together to be consumed on a single consumer. This works as expected.
In parallel, the producer continues to send non-grouped messages (i.e. without JMSXGroupID)
The problem:
We noticed that once a consumer was selected to process a specific group, it no longer gets the non-grouped messages. Even if it is completely idle. The non-grouped messages are always sent to the other consumers.
The rogue consumer returns to consume non-grouped messages only after we close the group that was assigned to it (by setting JMSXGroupSeq=-1).
Is this a normal behavior? We expected that non-grouped messages will continue to be delivered in the same round-robin fashion as usual, to all consumers.
We were unable to find a clear reference to this in ActiveMQ documentation.
There's a bit of a no-win situation for the message broker here. If there are active message groups in play, the the broker has to assume that further messages will be produced that fall into those groups. So a message consumer that has become bound to a particular group needs to remain available to consumer later messages of that group, rather than ungrouped messages. After all, an ungrouped message can be handled elsewhere, while a grouped message can't.
However, we also want to have a fair-ish distribution of messages between consumers. So it makes sense that a consumer that is bound to a group, or groups, could take some work when it is idle.
But how do we know it is idle? What happens if a consumer takes a bunch of ungrouped messages (and don't forget the default pre-fetch behaviour), and then new messages arrive that match its specific group?
The fact that closing a group restores the "group consumer" to default behaviour suggests to me that this is not a bug, but a deliberate attempt to make a reasonable compromise in a tricky situation. It seems reasonable to me to ask for a feature to be added, where "group consumers" can take part in ungrouped workload, but I would be inclined to see that as an enhancement.
Just my $0.02, of course.
I had a question on how rabbitmq works with batching acknowledgements. I understand that the Prefetch value is the max number of messages that will get queued before reaching its limit. However, I wasn't sure if the ack's manage themselves or if I have to manage this in code.
Which method is correct?
Send each basicAck with multiple set to true
or
wait until 10 acks were supposed to be sent out and send only the last one and AMQP will automatically send all previous in queue. (with multiple set to true)
TL;DR multiple = true is faster in some cases but requires a lot more careful book keeping and batch like requirements
The consumer gets messages that have a monotonic-ly growing id specific to that consumer. The id is a 64 bit number (it actually might be an unsigned 32 bit but since Java doesn't have that its a long) called the delivery tag. The prefetch is the most messages a consumer will receive that are unacked.
When you ack the highest delivery tag with multiple true it will acknowledge all the unacked messages with a lower delivery tag (smaller number) that the consumer has outstanding. Obviously if you have high prefetch this is faster than acking each message.
Now RabbitMQ knows the consumer received the messages (the unacked ones) but it doesn't know if all those messages have been correctly consumed. So it is on the burden of you the developer to make sure all the previous messages have been consumed. The consumer will deliver the messages in order (I believe internally the client uses a BlockingQueue) but depending on the library/client used downstream the messages might not be.
Thus this really only works well when you are batching the messages together in a single go (e.g. transaction or sending a group of messages off to some other system) or buffering reliably. Often this is done with a blocking queue and then periodically draining the queue to send a group of messages to a downstream system.
On the other hand if you are streaming each message in real time then you can't really do this (ie multiple = false).
There is also the case of one of the message being bad in the group (e.g. drained from internal queue... not rabbit queue) and you won't to nack that bad one. If that is the case you can't use multiple = true either.
Finally if you wait for a certain amount messages (instead of say time) more than the prefetch you will wait indefinitely.... not a good idea. You need to wait on time and number of messages must be <= prefetch.
As you can see its fairly nontrivial to correctly use multiple = true.
First one correction regarding Prefetch value is the max number of messages that will get queued before reaching its limit. - this is not what prefetch value is; prefetch value is the number of UN-ACKed messages that consumer "gets" from the queue. So they are kind of assigned to the consumer but remain in the queue until they are acknowledged. Quote from here, when prefetch is 1
This tells RabbitMQ not to give more than one message to a worker at a
time. Or, in other words, don't dispatch a new message to a worker
until it has processed and acknowledged the previous one.
And for your question:
I wasn't sure if the ack's manage themselves or if I have to manage
this in code.
You can set the auto ack flag to true and then you could say that the ack's manage themselves
I'm occasionally getting the following EJB exception across several different message driven beans:
javax.ejb.EJBException: Failed to acquire the pool semaphore, strictTimeout=10000
This behavior closely corresponds to when a particular database is having issues and thereby increases the amount of time spent in the MDB's onMessage function. The messages are being delivered by an ActiveMQ broker (version 5.4.2). The prefetch on the MDBs is 2000 (20 Sessions x 100 Messages per session).
My question is a general one. What exactly is happening here? I know that a message which has been delivered to the server running the MDB will time out after 10 seconds if there is no instance in the bean pool to handle it, however how has that message been delivered to the server in the first place? My assumption up to this point is that the MDB requests messages from the broker in the quantity of only when it no longer has any messages to process. Are they simply waiting in that server-side "bucket" for too long?
Has anyone else run into this? Suggestions for tuning prefetch/semaphore timeout?
EDIT: Forgot to mention I'm using JBoss AS 5.1.0
After doing some research I've found a satisfactory explanation for this EJBException.
MessageDrivenBeans have an instance pool. When a batch of JMS messages is delivered to an MDB in the quantity of the prefetch each are assigned an instance from this pool and are delivered to that instance via the onMessage function.
A little about how the pool works: In JBoss 5.1.0 the pooled beans such as MDBs and SessionBeans are configured by default through JBoss AOP, specifically a file in the deploy directory titled "ejb3-interceptors-aop.xml". This file creates interceptor bindings and default annotations for any class matching its domain. In the case of the Message Driven Bean domain, among other things a org.jboss.ejb3.annotation.Pool annotation:
<annotation expr="class(*) AND !class(#org.jboss.ejb3.annotation.Pool)">
#org.jboss.ejb3.annotation.Pool (value="StrictMaxPool", maxSize=15, timeout=10000)
</annotation>
The parameters of that annotation are described here.
Herein lies the rub. If the message prefetch exceeds the maxSize of this pool (which it usually will for high throughput messaging applications) you will necessarily have messages that are waiting for an MDB instance. If the time from message delivery to calling onMessage exceeds the pool timeout for any message, an EJBException will be thrown. This may not be an issue for the first few iterations of the message distribution, but if you have a large prefetch and long average onMessage time, the message towards the end of the queue will begin to fail.
Some quick algebra reveals that this will occurs, roughly speaking, when
timeout < (prefetch x onMessageTime) / maxSize
This assumes that messages are distributed instantaneously, and each onMessage takes the same time but should give you a rough estimate of whether you're way out of bounds.
The solution to this problem is more subjective. Simply increasing the timeout is a naive option, because it will mask the fact that messages are sitting on your application server instead of your queue. Given that onMessage time is somewhat fixed, decreasing the prefetch is most likely a good option as is increasing the pool size, if resources allow. In tuning this I decreased timeout in addition to decreasing prefetch substantially and increasing maxSize to keep messages on the queue for longer while maintaining my alert indicator for when onMessage times are higher than normal.
What jpredham says is correct. Also plz check whether
'strictMaximumSize' set to true
which could lead to https://issues.jboss.org/browse/JBAS-1599