We are using ActiveMQ 5.3.1 as a standalone broker in our system, and every so often we get a big spike in messages (intentional, for example on failover, we re-subscribe). We currently have ProducerFlowControl turned on, as this seemed a sensible way to stop components from falling over during these spikes.
However, it seems we have an issue with the Flow Control - once it kicks in, the Producers seem to lock indefinitely, even once all inflight messages have been consumed. As soon as we see the message
Usage Manager memory limit (1048576) reached
Our producers can no longer send any messages to the topic. This seems odd - I thought it would be more of a "one-in-one-out" policy. I read somewhere that FlowControl does not work very well for Async Topic producers (which is exactly what we have) so I am wondering if there is a better way to configure this?
Also, how long does Flow Control last once it has kicked in? Will it throttle producers on that topic forever (until ActiveMQ is restarted? until producers are restarted?) or does it last a fixed or configurable amount of time (eg it waits for consumer to empty the topic, then waits 5 minutes)?
Any help would be appreciated. We are currently investigating turning Flow Control off and using File-based cursors instead. Any obvious downside with that approach?
Producer Flow Control occurs when you hit one of the size limits (memory, disk, etc.) for either the entire broker or a single destination. Once you hit it, producers on that destination (or the whole broker, depending on which limit you hit) are unable to send more messages until enough space frees up to hold the next one. So your one-out-one-in mental model is the right one (though if the messages are of different sizes, then it might not be truly one-for-one). This will continue for as long as the limit continues to be hit because producers are faster than consumers; it's not time-based, and it's not forever, just until consumers start catching up and you're not running into any limits.
If you're hitting PFC, it could mean that you haven't properly sized some limit in your broker to handle bursts of data or periods where consumers are offline (so you should stop and do that, not turn off PFC), or that you have a systemic problem where your producers are always going to outrun your consumers (so you need to either speed up your consumers, slow down your producers, find a way to allow multiple consumers to consume messages in parallel, or configure the broker to drop some of your messages for you).
Or it could mean that your topics have some leftover durable subscriptions for consumers that are currently (permanently?) offline, and the broker has to keep those messages so it can deliver them when the consumer comes back online (which it might never do). Because the broker's got to hold onto those messages, it can't allow any new ones to be sent, even though the currently online consumers have processed their copies of all of the messages. This one's my best guess based on what you've written, though it could certainly be other things instead.
In any case, PFC kicking in is almost certainly just a secondary symptom of something else wrong with how you've designed your system, and you should figure that out and fix it rather than just turning off PFC. (After all, if you're perpetually producing faster than you can consume, writing messages to disk is just going to run you out of disk space if you turn out of PFC.)
Most importantly, 5.3.1 is VERY, VERY old, and there have been a TON of improvements between then and 5.10.0 (the current released version). I wouldn't even consider running a broker on anything before 5.5.1 because of all of the bugs that were found in the versions before it, but really I wouldn't recommend running on anything before 5.8.0 and there'd have to be a good reason not to just go all the way to 5.10.0. And you're certainly not going to get any support for 5.3.1 from the community if you hit something that you think might be a bug, so do yourself a favor and upgrade to a version that was released more recently than 2010. (And hopefully just upgrading will fix the bad behavior you're seeing, but if not, you'll be in a better position to ask for someone to troubleshoot the problem.)
Related
I just work with an new project backed by RabbitMQ, and there are multiple consumer instances created listening to the same queue when the application starts. Howerver they shares the same connections with different channels.
The messages from the queue are massive(millions messages for one single producing behavior ) so I guess the very first code author is trying to do something to make consuming faster.
I am trying to find some posts discussing on this but I can't find a very certain answer.
What I get so far is:
Each channel will have a separate dispatch thread
The operation commands on the same channel is serialized even though they are called in multiple thread
So
creating multiple consumers thus multiple channels will have multiple dispatch threads, but I don't think it provided a better performance to message dispatching since the dispatch should far from enough with one single thread.
The operation of ack will can be paralized in different channels, I am not quite sure this will give any better performances.
Since more channels consume more system resources I wonder is this practice good?
There seem to be a few things going on here, so let's try to look at this scenario from a holistic perspective.
For starters, it sounds like the original designer of this code understood some basics about RabbitMQ (or learned a few things by trial and error), but may have had trouble putting all the pieces together- hopefully I can help.
RabbitMQ connections are, in reality, AMQP-over-TCP connections (and thus are somewhere around the session layer of the OSI model). TCP connections are supposed to be opened up and used until some sort of network interruption or application shutdown closes them (and for this reason, AMQP has trouble with firewalls and other smart network devices). Using a single TCP connection for message processing activities for a single logical process is a good idea, as creating and destroying TCP connections is usually an expensive process for the computer, which leads to
RabbitMQ channels are used to multiplex communication streams in the AMQP-Over-TCP connection (and are defined in the AMQP Protocol Spec). All they do is specify an integer value (I can't remember the number of bytes, but it doesn't matter anyway) used to preface the subsequent command or response on a TCP connection. Most AMQP operations are channel-specific. For the purposes of higher-level operations, channels are treated similar to connections, as they are application-level constructs.
Now, where I think the question starts to go off the rails a bit is here:
The messages from the queue are massive(millions messages for one
single producing behavior ) so I guess the very first code author is
trying to do something to make consuming faster.
A fundamental assumption about a system which uses queues is that messages are consumed at approximately the same rate that they are produced. Queues exist to buffer uneven producing activities. The mathematics and statistics of how queues work are quite interesting, and assuming the production of messages is done in response to some real-world stimulus, your system is virtually guaranteed to behave in a predictable manner. Therefore, your design goal is to ensure that there are enough consumers to process the messages that are produced, and to respond to changing conditions as needed. Your goal should not be to "speed up" the consumers (unless they have some specific issue), but rather to have enough consumers to process the total load.
Further, the average number of items in the queue at any time should approach zero. It is usually a good idea to have overcapacity so that you don't wind up with an unstable situation where messages start accumulating in the queue (and the queue ends up looking like the Stack Overflow Close Vote Queue).
And that brings us to an attempt to answer your fundamental question, which seems to deal with threading and possibly detailed implementation of the Java client, which I will readily admit I have not used (I'm a .NET guy).
Here are some design guidelines for your software:
Ensure that a single thread uses no more than one channel.
Use one TCP connection per logical consuming process.
Balance the number of logical processes on a single physical machine such that resource contention is not a problem (you don't want to starve your consumers of computer resources).
Try to use BASIC.GET as opposed to a push-based consumer. Use of consumers is difficult in practice, and there is no performance benefit at the protocol level over a BASIC.GET. Note I do not know if the Java library has implemented these differently such that it does cause a performance difference- stranger things have been known to happen.
If you do use consumers, make sure pre-fetch is set to 0 (disabled) and that AutoAck is set to false if reliable processing is important (most applications require reliable processing). Along with this, make sure you are acknowledging messages upon completion of processing!
Periodically reboot your consuming threads, channels, and processors - or do a BASIC.Recover. There are degrees of randomness that will result in unacknowledged messages accumulating over time, and this will deal with it.
Again, if you prefer to use consumers, generally speaking to share consumers across channels is a bad idea. Each consumer should get its own channel.
I have a scenario with these particular demands:
Production ready & stable.
Point to point connection, with the producer behind a firewall and a consumer in the cloud. It might be possible to split the traffic between a couple of producers\consumers, but all the traffic still has to traverse a single WAN connection which will probably be the bottleneck.
High throughput - something along the order of 300 Mb/sec (may be up to 1Gb!). Message sizes vary from ~1KB to possibly several MBs.
Guaranteed delivery a must - every message has to arrive at the consumer eventually, so we need to start saving messages to disk in the event of a momentary network outage or risk running out of memory.
Message order is not important, messages are timestamped and can be re-arranged at the consumer.
Highly preferable but not as important - should run on both linux & windows (JVM seems the obvious choice)
I've been looking at so many MQs lately, and I don't have any hands-on experience with any.
Thought it will be a better idea to ask someone with experience.
We're considering mostly Kafka, but I'm not sure it's the best for our use case, seems to be tailored to distributed deployment & mutliple topics\consumers\producers. Also, definitely not production ready on windows.
What about Apache ActiveMQ or Apollo\Artemis? RabbitMQ seems not to be a good fit for our performance requirements. Or maybe there's some Java library that has the features we need without a middleman broker?
Any help making sense of this kludge would be greatly appreciated.
If anyone comes across this, we went with Kafka in the end. Its performance is impressive and so far it's very stable on linux. No attempt yet to run it on windows in production deployments.
UPDATE 12/3/2017:
Works fine and very stable on Linux, but on Windows this is not usable in production. Old data never gets deleted due to leaky file handles, the relevant Jira is being ignored since 2013: https://issues.apache.org/jira/browse/KAFKA-1194
I must handle about 100 JMS Queue in a point-to-point messaging architecture. Every queue has a consumer. So I will have 100 consumer threads to handle them. Is it ok?
1)ActiveMQ Support your request(suggest write a connection pool)
2)you should confirm you server configuration whether is ok,when
QPS is high,
Instead of 100 queues, you could use a single queue and provide JMS message properties, having each consumer filter just the messages it wants.
What this does is give you some more options in architecture and deployment. You could have a single process consume multiple type of messages. Depending on your scaling issues, you could have multiple instances of a single consumer spread out among processes/servers/whatever.
You could also have one consumer for all 100 logical queues, reading the property and figuring out where to hand off the message internally, again, depending on whatever design issues you're running into.
Overall, messaging is so light-weight that it takes a significant volume of messages or a significant size of individual messages to really hurt things. I've got an ActiveMQ app that upon restart might have to process 10K/20K messages and it's complete in seconds. Fairly small messages, but still very possible (and my experience with other MQs is similar performance, as long as your processing is not too overwhelmingly difficult, you should be able to keep up).
I am designing a (near) real-time Netty server to distribute a large number of very small messages to a large number of clients across the internet. In internal, go as fast as you can testing, I found that I could do 10k clients no sweat, but now that we are trying to go across the internet, where the latency, bandwidth etc varies pretty wildly we are running into the dreaded outOfMemory issues, even with 2 gigs of RAM.
I have tried various workarounds(setting the socket stack sizes smaller, setting high and low water marks, cancelling things that are too old), and they help a little, but they seem to only help a little bit. What would some good ways to optimize Netty for sending large #s of small messages without significant delays? Also, the bulk of the message only consists of one kind of message that I don't particularly care if it doesn't arrive. I would use UDP but because we don't control the client, thats not really a possibility. Is it possible to set a separate timeout solely for this kind of message without affecting the other messages?
Any insight you could offer would be greatly appreciated.
usually, if see outOfMemory you can use a thread dump tool to dump the threads. Or use something like jvirtualvm and jconsole to find out which class doesn't get GCed and keep eating your memory.
2Gigs is not big for 64 bit machines nowadays.Try to turn that a number bigger to about 3 or 4 G to see if you don't hit OOM.
If you find you can handle 10k connections easily in LAN, try to add a small delay in your netty handler. Check what happens.
You might want to look into load balancing approach. It is used to distribute the workload across the distributed system using both hardware and software. The details of what is suitable for your system depend on several factors which includes hardware upgrade, etc. Certainly, 2GB of RAM is fairly small to server 10k users and you will need to increase this limit.
You don't say whether the subscription stream is constant or bursty. You also don't say whether there is a minimum number of messages / second the client must support.
Given that I don't know anything about Redis, are any of the following practical?
For the messages you don't care about, if channel.isWritable() == false, discard immediately. Unfortunately I don't know of a way to cancel messages that are in Netty's send buffer. You wouldn't be able to cancel messages that have been passed to the TCP send buffer anyway so it's not really something to rely on.
Slow reception from the subscription to the rate of the slowest client.
Determine which clients can't keep up (maybe use the write timeout handler) and move them to a separate subscription which can be slowed down. Duplicate the published messages to both subscriptions.
Can you split the messages to send to the clients across different subscriptions. If a client can't keep up unsubscribe it from the unimportant messages.
If your average send rate is higher than the client can support over time then there isn't really a solution other than negotiating a change in requirements to reduce the maximum allowable throughput.
I've been asked to design and implement a system for receiving a high volume of automated sensor data from a large number of devices. This data will be produced at regular intervals and sent to the server as xml in an http post. The devices will keep resending the same data if they don't receive a specific acknowledgment from the server. Some potentially heavy duty processing of this data will need to occur before it's inserted to a number of tables in the main database via a transaction, and additionally some data points will need to be enqueued to be re-directed to other external urls.
I'm planning on using a Java application server (leaning towards GlassFish) with a servlet to receive the incoming data. I'd like to implement some kind of queuing mechanism to store the data temporarily so that the response back to the sensor isn't dependent on all the intermediate processing. Separate independent queues are also a requirement for the data re-direction piece. After doing some research the two main options seem to be:
1) Install a database on the app server and use tables for the various queues. The queues would be processed by a Java application, either running in the app server or standalone as it's own service.
2) Use a database backed JMS solution to implement the queuing.
I'm not that familiar with JMS but from what I've read it seems to be the better solution in this case. The primary requirement is that no sensor data ever be lost or dropped from the queue before being processed and that it be processed more or less sequentially. We'd also like to make it easy to halt the processing of some of the queues at certain times but still have them accumulate data and for these messages to never automatically expire.
With strategy 1 it's obvious to me how to meet these requirements but it may be less robust and scalable, and more complex to develop than strategy 2, since I'll need to write my own multi-threaded code to handle the various independent queues. I'm wondering what the potential pitfalls could be in using JMS queues for this purpose since I've never worked with them before.
Data integrity is a big issue so I need to make sure JMS can guarantee no data loss in the event of a server reboot, power outage, or if the queue gets very large for some reason. For instance could a problem completing transactions to the main database for a period of time potentially cause the JVM to run out of memory, crash, and lose all accumulated data? (This would be the nightmare scenario).
Also, I was wondering if there would be any way to pause the JMS queue processing via an app server admin tool or to easily see what's in the queue (I would be enqueuing an object which would be the message xml plus some other data, including timestamp received, etc.) I've read a few posts on here that deal with related issues but wanted to get some direct feedback. Basically I'd like to know of instances (if any) where JMS is not an appropriate queuing solution and if this is one of those cases. Any advice is greatly appreciated.
Kaleb's answer talks about the benefits of JMS quite eloquently, but since you're asking about pitfalls, here's what I can think of.
Not all JMS implementations are equal. In theory you can use whatever implementation suits your needs, but unless you're prepared to do some serious load testing and failure condition testing, you can't know that a particular implementation isn't going to fail under your particular use case.
Most JMS use a transactional datastore like a relational database as their back end. That means that rather than writing directly to whatever datastore you're familiar with, you have to rely on the JMS implementation's extra layer between you and that stored messages.
While swapping JMS implementations to find the one that perfectly fits your needs may seem like a simple endeavor because of the homogeneous JMS API, the critical features for failure handling, JMS server monitoring, and all the other cool stuff that exists above and beyond messaging is going to be a hassle to deal with if you do change your implementation.
That said, I think you'd be crazy to write to the DB yourself instead of going with JMS. On the first point, ActiveMQ is a venerable JMS server used in many enterprise environments. On the second point, the fact is you'd just end up writing that extra layer yourself in order to implement messaging, and your code won't have the benefit of thousands of eyes (or a set of paid developers who's sole job it is to respond to customers and make sure the JMS implementation is solid). On the third point, well the same ends up being true of your backend datastore. Use JMS, you'll save yourself trouble in the long run.
If you want to go the JMS route, a standalone JMS-compatible message broker (separate from your app server) would be a good choice. Message brokers range from free open-source (like ActiveMQ at http://activemq.apache.org/ or OpenMQ at https://mq.dev.java.net/), to large-scale commercial solutions (IBM's WebSphere MQ at http://www-01.ibm.com/software/integration/wmq/ is one of the largest).
Message brokers offer guaranteed delivery (provided the server's up and listening), and you can do quite a bit to ensure that the system is fail-safe including integrated backup broker servers and instant power backup. Broker queues can eventually run out of room if your app server isn't picking up the messages, but you can assign huge queue depth (100's of GB) and have the server send alerts if the messages aren't getting processed and the queue reaches a certain percentage.
Your Java app would then run on a different server entirely, and would connect to the broker and pull messages off of the queue as fast as possible. If the app server crashes or stops picking up messages for any other reason, the broker would just keep all messages in that queue until the app server begins picking them up again.
You will be wanting to implement a poison message queue in your implementation - this is the place that messages unable to be processed after some number of retries will arrive.
You will probably need to write some code that can examine the messages in that queue and re-send them to the appropriate destination after fixing whatever is causing them to fail.
If sequence of message processing is important, a message ending up in the poison queue could mean all processing is halted until that message is corrected.
As far as fault tolerance goes, you can have multiple instances of the consuming services subscribe to the same queue or topic, providing an ability to continue processing even if one or more instances goes down.
Finally, have a watchdog process that pings the various consumers on your message queue, and if one doesn't respond, have it send a message that results in a new instance being started. In this way, your message processing environment can be somewhat self regulating.