I'm occasionally getting the following EJB exception across several different message driven beans:
javax.ejb.EJBException: Failed to acquire the pool semaphore, strictTimeout=10000
This behavior closely corresponds to when a particular database is having issues and thereby increases the amount of time spent in the MDB's onMessage function. The messages are being delivered by an ActiveMQ broker (version 5.4.2). The prefetch on the MDBs is 2000 (20 Sessions x 100 Messages per session).
My question is a general one. What exactly is happening here? I know that a message which has been delivered to the server running the MDB will time out after 10 seconds if there is no instance in the bean pool to handle it, however how has that message been delivered to the server in the first place? My assumption up to this point is that the MDB requests messages from the broker in the quantity of only when it no longer has any messages to process. Are they simply waiting in that server-side "bucket" for too long?
Has anyone else run into this? Suggestions for tuning prefetch/semaphore timeout?
EDIT: Forgot to mention I'm using JBoss AS 5.1.0
After doing some research I've found a satisfactory explanation for this EJBException.
MessageDrivenBeans have an instance pool. When a batch of JMS messages is delivered to an MDB in the quantity of the prefetch each are assigned an instance from this pool and are delivered to that instance via the onMessage function.
A little about how the pool works: In JBoss 5.1.0 the pooled beans such as MDBs and SessionBeans are configured by default through JBoss AOP, specifically a file in the deploy directory titled "ejb3-interceptors-aop.xml". This file creates interceptor bindings and default annotations for any class matching its domain. In the case of the Message Driven Bean domain, among other things a org.jboss.ejb3.annotation.Pool annotation:
<annotation expr="class(*) AND !class(#org.jboss.ejb3.annotation.Pool)">
#org.jboss.ejb3.annotation.Pool (value="StrictMaxPool", maxSize=15, timeout=10000)
</annotation>
The parameters of that annotation are described here.
Herein lies the rub. If the message prefetch exceeds the maxSize of this pool (which it usually will for high throughput messaging applications) you will necessarily have messages that are waiting for an MDB instance. If the time from message delivery to calling onMessage exceeds the pool timeout for any message, an EJBException will be thrown. This may not be an issue for the first few iterations of the message distribution, but if you have a large prefetch and long average onMessage time, the message towards the end of the queue will begin to fail.
Some quick algebra reveals that this will occurs, roughly speaking, when
timeout < (prefetch x onMessageTime) / maxSize
This assumes that messages are distributed instantaneously, and each onMessage takes the same time but should give you a rough estimate of whether you're way out of bounds.
The solution to this problem is more subjective. Simply increasing the timeout is a naive option, because it will mask the fact that messages are sitting on your application server instead of your queue. Given that onMessage time is somewhat fixed, decreasing the prefetch is most likely a good option as is increasing the pool size, if resources allow. In tuning this I decreased timeout in addition to decreasing prefetch substantially and increasing maxSize to keep messages on the queue for longer while maintaining my alert indicator for when onMessage times are higher than normal.
What jpredham says is correct. Also plz check whether
'strictMaximumSize' set to true
which could lead to https://issues.jboss.org/browse/JBAS-1599
Related
Scenario/Use Case:
I have a Spring Boot application using Spring for Kafka to send messages to Kafka topics. Upon completion of a specific event (triggered by http request) a new thread is created (via Spring #Async) which calls kafkatemplate.send() and has a callback for the ListenableFuture that it returns. The original thread which handled the http request returns a response to the calling client and is freed.
Normal Behavior:
Under normal application load I've verified that the individual messages are all published to the topic as desired (application log entries upon callback success or failure as well as viewing the message in the topic on the kafka cluster). If I bring down all kafka brokers for 3-5 minutes and then bring the cluster back online the application's publisher immediately re-establishes connection to kafka and proceeds with publishing messages.
Problem Behavior:
However, when performing load testing, if I bring down all kafka brokers for 3-5 minutes and then bring the cluster back online, the Spring application's publisher continues to show failures for all publish attempts. This continues for approximately 7 hours at which time the publisher is able to successfully re-establish communication with kafka again (usually this is preceeded by a broken pipe exception but not always).
Current Findings:
While performing the load test, for approx. 10 minutes, I connected to the the application using JConsole and monitored the producer metrics exposed via kafka.producer. Within the first approx. 30 seconds of heavy load, buffer-available-bytes continues to decrease until it reaches 0 and stays at 0. waiting-threads remains between 6 and 10 (alternates everytime I hit refresh) and buffer-available-bytes remains at 0 for approx. 6.5 hours. After that buffer-available-bytes shows all of the originally allocated memory restored but kafka publish attempts continue failing for approx. another 30 minutes before finally the desired behavior restores.
Current Producer Configuration
request.timeout.ms=3000
max.retry.count=2
max.inflight.requests=1
max.block.ms=10000
retry.backoff.ms=3000
All other properties are using their default values
Questions:
Given my use case would altering batch.size or linger.ms have any positive impact in terms of eliminating the issue encountered when under heavy load?
Given that I have separate threads all calling kafkatemplate.send() with separate messages and callbacks and I havemax.in.flight.requests.per.connection set to 1, are batch.size and linger.ms ignored beyond limiting the size of each message? My understanding is that no batching is actually occurring in this scenario and that each message is sent as a separate request.
Given that I have max.block.ms set to 10 seconds, why does buffer memory remain utilized for so long and why do all messages continue to fail to be published for so many hours. My understanding is that after 10 seconds each new publish attempt should fail and return the failure callback which in turn frees up the associated thread
Update:
To try and clarify thread usage. I'm using the single producer instance as recommended in the JavaDocs. There are threads such as https-jsse-nio-22443-exec-* which are handling incoming https requests. When a request comes in some processing occurs and once all non-kafka related logic completes a call is made to a method in another class decorated with #Async. This method makes the call to kafkatemplate.send(). The response back to the client is shown in the logs before the publish to kafka is performed (this is how Im verifying its being performed via separate thread as the service doesn't wait to publish before returning a response).
There are task-scheduler-* threads which appear to be handling the callbacks from kafkatemplate.send(). My guess is that the single kafka-producer-network-thread handles all of the publishing.
My application was making an http request and sending each message to a deadletter table on a database platform upon failure of each kafka publish. The same threads being spun up to perform the publish to kafka were being re-used for this call to the database. I moved the database call logic into another class and decorated it with its own #Async and custom TaskExecutor. After doing this, I've monitored JConsole and can see that the calls to Kafka appear to be re-using the same 10 threads (TaskExecutor: core Pool size - 10, QueueCapacity - 0, and MaxPoolSize - 80) and the calls to the database service are now using a separate thread pool (TaskExecutor: core Pool size - 10, QueueCapacity - 0, and MaxPoolSize - 80) which is consistently closing and opening new threads but staying at a relatively constant number of threads. With this new behavior buffer-available-bytes is remaining at a healthy constant level and the application's kafka publisher is successfully re-establishing connection once brokers are brought back online.
I am using axon with distributed command bus which uses Jgroups for creating clusters. I fire approximately 100 messages.
I have the following configuration for tcp-gossip.xml:
sock_conn_timeout="300"
reaper_interval="0"
thread_pool.enabled="true"
thread_pool.min_threads="3"
thread_pool.max_threads="3"
max_bundle_timeout="10"
level="trace"
thread_pool.rejection_policy="Abort"
recv_buf_size="64K"
send_buf_size="20M"
/>
I get a java.util.concurrent.RejectedExecutionException
when running with this configuration which is but obvious because the rejection_policy is abort. But the message which is rejected is picked again and it is executed and also the order of execution is preserved. That means the message is kept somewhere in the buffer.
1> Does anyone know that where are the messages buffered in the JGroups.
2> Can any one explain exactly what exactly happens when we use abort rejection_policy ?
1) Messages are stored in UNICASTx or pbcast.NAKACKx protocols until these are confirmed to be received by all recipients.
2) In JGroups, there is non-trivial threading model. Messages are read from network in the receive thread and then passed to one of the three threadpools (regular, OOB and internal) for processing in the stack and application delivery. When the threadpool is busy and there is no queue configured/queue is full, the threadpool will reject the job, the message will be discarded on receiver side. Luckily, it will be resent later. You can monitor the number of rejected messages on TP.num_rejected_messages using JMX or probe.sh.
I was poking around the rabbitmq documentation, and it seems that rabbitmq does not handle message redelivery count. If I were to manually ACK/NACK messages, I would need to either keep the retry count in memory (say, by using correlationId as the unique key in a map), or by setting my own header in the message, and redelivering it (thus putting it at the end of the queue)
However, this is a case that spring handles. Specifically, I am referring to RetryInterceptorBuilder.stateful().maxAttempts(x). Is this count specific to a JVM though, or is it manipulating the message somehow?
For example, I have a web-app deployed to 2 servers, with maxAttempts set to 5. Is it possible that the total redelivery count will be anywhere from 5-9, depending on the order in which it is redelivered and reprocessed among the 2 servers?
Rabbit/AMQP does not allow modification of the message when requeueing based on rejection.
The state (based on messageId) is maintained in a RetryContextCache; the default is a MapRetryContextCache. This is not really suitable for a "cluster" because, as you say, the attempts may be up to ((maxAttempts - 1) * n + 1); plus it will cause a memory leak (state left on some servers). You can configure a SoftReferenceMapRetryContextCache in the RetryTemplate (RetryOperations in the builder) to avoid the memory leak but that only solves the memory leak.
You would need to use a custom RetryContextCache with some persistent shared store (e.g. redis).
I generally recommend using stateless recovery in this scenario - the retries are done entirely in the container and do not involve rabbit at all (until retries are exhausted, in which case the message is discarded or sent to the DLX/DLQ depending on broker configuration).
If you don't care about message order (and I presume you don't given you have competing consumers), an interesting technique is to reject the message, send it to a DLQ with an expiry set and, when the DLQ message expires, route it back to the tail of original queue (rather than the head). In that case, the x-death header can be examined to determine how many times it has been retried.
This answer and this one have more details.
I am using ActiveMQ 5.8 with wildcard consumers configured in camel route.
I am using default ActiveMQ configuration, so I have defaults as below
prefetch = 1
dispatch policy= Round Robin
Now I start a consumer jvm with 5 consumers each for 2 queues. both the queue has same type of message and same number of messages.
Consumers are doing nothing but printing the message (so no db blocking or slow consumer issue)
EDIT
I have set preFetch to 1 for each of the queue
What I observe is one of the queue getting drained faster than other.
What I expect is both the queue getting drained at equal pace, kind of load balance.
One surprising observation is
Though activemq webconsole shows 5 consumers for each of those queues
When I debug my consumer, I see only 5 threads / consumers from camel flow for a wildcard queue *.processQueue
What will be cause of above behavior?
How do I make sure that all the queue drain at equal pace?
Did anyone has experience to share on writting custom dispatch policy or overriding defaults of activemq?
I was able to find a reference to this behavior
Message distribution in case of wildcard queue consumers is random.
http://activemq.2283324.n4.nabble.com/Wildcard-and-message-distribution-td2346132.html#a2346133
Though this can be tuned by setting appropriate prefetch size.
After trial & error, I arrived at following formula, to have fair distribution across the consumers and all the queue getting de-queued at almost same pace.
prefetch = number of wildcard consumers
It's probably wrong to compare the rate the queues are consumed. The load balancing typically happens between consumers. So, the idea is that each of the five consumers on the first queue would get rather even load (given they are connected to the same broker).
However, I think you might want to double check your load test setup. It rarely gives predictable results when running broker and consumers on the same machine for instance.
I'm attempting to utilize the .NET Kaazing client in order to interact with a JMS back-end via web sockets. I'm struggling to understand the correct usage of sessions. Initially, I had a single session shared across all threads, but I noticed that this was not supported:
A Session object is a single-threaded context for producing and consuming messages. Although it may allocate provider resources outside the Java virtual machine (JVM), it is considered a lightweight JMS object.
The reason I had a single session was just because I thought that would yield better performance. Since the documentation claimed sessions were lightweight, I had no hesitation switching my code over to use a session per "operation". By "operation" I mean either sending a single message, or subscribing to a queue/topic. In the former case, the session is short-lived and closed immediately after the message is sent. In the latter case, the session needs to live as long as the subscription is active.
When I tried creating multiple sessions I got an error:
System.NotSupportedException: Only one non-transacted session can be active at a time
Googling this error was fruitless, so I tried switching over to transacted sessions. But when attempting to create a consumer I get a different error:
System.NotSupportedException: This operation is not supported in transacted sessions
So it seems I'm stuck between a rock and a hard place. The only possible options I see are to share my session across threads or to have a single, non-transacted session used to create consumers, and multiple transacted sessions for everything else. Both these approaches seem a little against the grain to me.
Can anyone shed some light on the correct way for me to handle sessions in my client?
There are several ways to add concurrency to your application. You could use multiple Connections, but that is probably not desirable due to an increase in network overhead. Better would be to implement a simple mechanism for handling the concurrency in the Message Listener by dispatching Tasks or by delivering messages via ConcurrentQueues. Here are some choices for implementation strategy:
The Task based approach would use a TaskScheduler. In the MessageListener, a task would be scheduled to handle the work and return immediately. You might schedule a new Task per message, for instance. At this point, the MessageListener would return and the next message would be immediately available. This approach would be fine for low throughput applications - e.g. a few messages per second - but where you need concurrency perhaps because some messages may take a long time to process.
Another approach would be to use a data structure of messages for work pending (ConcurrentQueue). When the MessageListener is invoked, each Message would be added to the ConcurrentQueue and return immediately. Then a separate set of threads/tasks can pull the messages from that ConcurrectQueue using an appropriate strategy for your application. This would work for a higher performance application.
A variation of this approach would be to have a ConcurrentQueue for each Thread processing inbound messages. Here the MessageListener would not manage its own ConcurrentQueue, but instead it would deliver the messages to the ConcurrentQueue associated with each thread. For instance, if you have inbound messages representing stock feeds and also news feeds, one thread (or set of threads) could process the stock feed messages, and another could process inbound news items separately.
Note that if you are using JMS Queues, each message will be acknowledged implicitly when your MessageListener returns. This may or may not be the behavior you want for your application.
For higher performance applications, you should consider approaches 2 and 3.