Is it ok to use multiple session and connection on JMS (ActiveMQ)? - java

I must handle about 100 JMS Queue in a point-to-point messaging architecture. Every queue has a consumer. So I will have 100 consumer threads to handle them. Is it ok?

1)ActiveMQ Support your request(suggest write a connection pool)
2)you should confirm you server configuration whether is ok,when
QPS is high,

Instead of 100 queues, you could use a single queue and provide JMS message properties, having each consumer filter just the messages it wants.
What this does is give you some more options in architecture and deployment. You could have a single process consume multiple type of messages. Depending on your scaling issues, you could have multiple instances of a single consumer spread out among processes/servers/whatever.
You could also have one consumer for all 100 logical queues, reading the property and figuring out where to hand off the message internally, again, depending on whatever design issues you're running into.
Overall, messaging is so light-weight that it takes a significant volume of messages or a significant size of individual messages to really hurt things. I've got an ActiveMQ app that upon restart might have to process 10K/20K messages and it's complete in seconds. Fairly small messages, but still very possible (and my experience with other MQs is similar performance, as long as your processing is not too overwhelmingly difficult, you should be able to keep up).

Related

Limit on the amount of queues and verticles in Vert.x

We are now in the process of refactoring our messaging application written in Vert.x. The application processes incoming messages from users. Initially, it was implemented so that there is a single verticle instance that listens to a single queue in the event bus and processes all the incoming messages.
What we are thinking of doing is to refactor it so that it works a bit similar to actor model: we deploy an instance of a verticle for each active user and make it listen to a user-specific queue. This way the verticle instance can maintain user-specific state and the parallelization of the message processing becomes much easier.
The issue, however, is that this would lead to a huge number of verticles deployed (30k - 50k in parallel) and huge amount of queues in the eventbus. And also we would need to maintain the verticles manually (undeploy unused verticles and deploy the ones when there is a message from a new user).
Question is - is this actor-style architecture good for vert.x and can it handle large amount of deployed verticles and eventbus queues at the same time?
There's one major correction to be made here - EventBus is a single queue. So, you won't have "huge number of queues". There will be only one. You'll have huge number of addresses on a single queue.
But is this number so huge? Well, is a HashMap of 50K elements can be considered huge? Probably not, at least in terms of keys. Now note that this applies only to Vert.x in non-clustered mode. Clustered Vert.x is different (still should work, though).
Now having those verticles is another matter. Each verticle is a separate object, and if you plan to store some data in it, it will be even larger. But if you can afford machines with some decent RAM (16GB+), it should work just fine.
What does concern me in this solution, though, is that you plan to deploy verticles on demand, then undeploy them. It does incur delays, so your users will experience degraded performance for first message they send.
What you call "actor-style" does not mean, that you have to inflate a new verticle instance per user. If you do so, you are going to get a system with 98% redundancy.
It's absolutely enough to register an event-bus address for each user and use some sort of persistant storage to keep track of them. Such a storage can be any DB for long-term persistance or a cluster-wide SharedMap for short-term, or a combination of both.
Perhaps you don't even need a address-per-user scheme. Such a scheme is nice when the users are connected constantly to your system via some sort of EventBusBridge. If this is not a case, you can register a single event-bus address for all users and process messages based on payload.

How many operating system resources is needed for one Java Kafka Consumer?

I want to use hundreds of thousands of KafkaConsumer. For example, I need 100_000 consumers for some architectural pattern. I am thinking, is it OK? Or should I to refactor my system and use few consumers for the whole system (for example, 10 consumers instead of 100_000).
So, my questions are:
Is there connection pool in KafkaConsumer, or each consumer creates own connection to kafka brokers?
Is there thread pool in KafkaConsumer, or each consumer creates own thread (I hope, it does not).
What is average memory consumption per KafkaConsumer?
What do you think about such architectural pattern?
1,2) Consumers request metadata from one of the brokers which is the leader of the partition. Each consumer is able to handle all IO from a single thread as the Java clients are designed around an event loop which is driven by the poll(). You can also build multi-threaded consumers but you'd need take care of offset management. Refer to Confluent's documentation for more details regarding the implementation of Java Clients.
3) According to Apache Kafka and Confluent Enterprise Reference architecture,
Consumers use at least 2MB per consumer and up to 64MB in cases of
large responses from brokers (typical for bursty traffic)
4) The number of consumers you've mentioned is huge so you'd need a very good reason to go for 100,000 consumers. It depends on the scenario though, but even Netflix should be using a lot less than that.

RabbitMQ-Is it a good practice to create multiple consumers for a single queue in one application process

I just work with an new project backed by RabbitMQ, and there are multiple consumer instances created listening to the same queue when the application starts. Howerver they shares the same connections with different channels.
The messages from the queue are massive(millions messages for one single producing behavior ) so I guess the very first code author is trying to do something to make consuming faster.
I am trying to find some posts discussing on this but I can't find a very certain answer.
What I get so far is:
Each channel will have a separate dispatch thread
The operation commands on the same channel is serialized even though they are called in multiple thread
So
creating multiple consumers thus multiple channels will have multiple dispatch threads, but I don't think it provided a better performance to message dispatching since the dispatch should far from enough with one single thread.
The operation of ack will can be paralized in different channels, I am not quite sure this will give any better performances.
Since more channels consume more system resources I wonder is this practice good?
There seem to be a few things going on here, so let's try to look at this scenario from a holistic perspective.
For starters, it sounds like the original designer of this code understood some basics about RabbitMQ (or learned a few things by trial and error), but may have had trouble putting all the pieces together- hopefully I can help.
RabbitMQ connections are, in reality, AMQP-over-TCP connections (and thus are somewhere around the session layer of the OSI model). TCP connections are supposed to be opened up and used until some sort of network interruption or application shutdown closes them (and for this reason, AMQP has trouble with firewalls and other smart network devices). Using a single TCP connection for message processing activities for a single logical process is a good idea, as creating and destroying TCP connections is usually an expensive process for the computer, which leads to
RabbitMQ channels are used to multiplex communication streams in the AMQP-Over-TCP connection (and are defined in the AMQP Protocol Spec). All they do is specify an integer value (I can't remember the number of bytes, but it doesn't matter anyway) used to preface the subsequent command or response on a TCP connection. Most AMQP operations are channel-specific. For the purposes of higher-level operations, channels are treated similar to connections, as they are application-level constructs.
Now, where I think the question starts to go off the rails a bit is here:
The messages from the queue are massive(millions messages for one
single producing behavior ) so I guess the very first code author is
trying to do something to make consuming faster.
A fundamental assumption about a system which uses queues is that messages are consumed at approximately the same rate that they are produced. Queues exist to buffer uneven producing activities. The mathematics and statistics of how queues work are quite interesting, and assuming the production of messages is done in response to some real-world stimulus, your system is virtually guaranteed to behave in a predictable manner. Therefore, your design goal is to ensure that there are enough consumers to process the messages that are produced, and to respond to changing conditions as needed. Your goal should not be to "speed up" the consumers (unless they have some specific issue), but rather to have enough consumers to process the total load.
Further, the average number of items in the queue at any time should approach zero. It is usually a good idea to have overcapacity so that you don't wind up with an unstable situation where messages start accumulating in the queue (and the queue ends up looking like the Stack Overflow Close Vote Queue).
And that brings us to an attempt to answer your fundamental question, which seems to deal with threading and possibly detailed implementation of the Java client, which I will readily admit I have not used (I'm a .NET guy).
Here are some design guidelines for your software:
Ensure that a single thread uses no more than one channel.
Use one TCP connection per logical consuming process.
Balance the number of logical processes on a single physical machine such that resource contention is not a problem (you don't want to starve your consumers of computer resources).
Try to use BASIC.GET as opposed to a push-based consumer. Use of consumers is difficult in practice, and there is no performance benefit at the protocol level over a BASIC.GET. Note I do not know if the Java library has implemented these differently such that it does cause a performance difference- stranger things have been known to happen.
If you do use consumers, make sure pre-fetch is set to 0 (disabled) and that AutoAck is set to false if reliable processing is important (most applications require reliable processing). Along with this, make sure you are acknowledging messages upon completion of processing!
Periodically reboot your consuming threads, channels, and processors - or do a BASIC.Recover. There are degrees of randomness that will result in unacknowledged messages accumulating over time, and this will deal with it.
Again, if you prefer to use consumers, generally speaking to share consumers across channels is a bad idea. Each consumer should get its own channel.

How many connections to maintain in RabbitMQ?

I am using the RabbitMQ java client.
My app has multiple exchanges and queues. Adopting something similar to the Pub/Sub model.
What is the best practice regarding connections?
Shall I have one connection per app?
I understand the channel model, and the thread (un)safety model. Just not sure if I should have multiple connections or not.
One connection per app is correct.
Within that connection, you will have many channels - where the actual work is done.
You can have hundreds or thousands of message producers and consumers (each on their on channel) inside a single connection.
If you start to see slowdown in your RMQ setup because you're dong too much work, look at clustering RMQ and/or standing up multiple instances of your app.
But you would still maintain 1 connection per app instance.
It will depends on the volumetry of messages you will have. If it really is huge, maybe 2 or 3 connections could do it, but one per application seems to be the best choice

Potential pitfalls in using a JMS queue?

I've been asked to design and implement a system for receiving a high volume of automated sensor data from a large number of devices. This data will be produced at regular intervals and sent to the server as xml in an http post. The devices will keep resending the same data if they don't receive a specific acknowledgment from the server. Some potentially heavy duty processing of this data will need to occur before it's inserted to a number of tables in the main database via a transaction, and additionally some data points will need to be enqueued to be re-directed to other external urls.
I'm planning on using a Java application server (leaning towards GlassFish) with a servlet to receive the incoming data. I'd like to implement some kind of queuing mechanism to store the data temporarily so that the response back to the sensor isn't dependent on all the intermediate processing. Separate independent queues are also a requirement for the data re-direction piece. After doing some research the two main options seem to be:
1) Install a database on the app server and use tables for the various queues. The queues would be processed by a Java application, either running in the app server or standalone as it's own service.
2) Use a database backed JMS solution to implement the queuing.
I'm not that familiar with JMS but from what I've read it seems to be the better solution in this case. The primary requirement is that no sensor data ever be lost or dropped from the queue before being processed and that it be processed more or less sequentially. We'd also like to make it easy to halt the processing of some of the queues at certain times but still have them accumulate data and for these messages to never automatically expire.
With strategy 1 it's obvious to me how to meet these requirements but it may be less robust and scalable, and more complex to develop than strategy 2, since I'll need to write my own multi-threaded code to handle the various independent queues. I'm wondering what the potential pitfalls could be in using JMS queues for this purpose since I've never worked with them before.
Data integrity is a big issue so I need to make sure JMS can guarantee no data loss in the event of a server reboot, power outage, or if the queue gets very large for some reason. For instance could a problem completing transactions to the main database for a period of time potentially cause the JVM to run out of memory, crash, and lose all accumulated data? (This would be the nightmare scenario).
Also, I was wondering if there would be any way to pause the JMS queue processing via an app server admin tool or to easily see what's in the queue (I would be enqueuing an object which would be the message xml plus some other data, including timestamp received, etc.) I've read a few posts on here that deal with related issues but wanted to get some direct feedback. Basically I'd like to know of instances (if any) where JMS is not an appropriate queuing solution and if this is one of those cases. Any advice is greatly appreciated.
Kaleb's answer talks about the benefits of JMS quite eloquently, but since you're asking about pitfalls, here's what I can think of.
Not all JMS implementations are equal. In theory you can use whatever implementation suits your needs, but unless you're prepared to do some serious load testing and failure condition testing, you can't know that a particular implementation isn't going to fail under your particular use case.
Most JMS use a transactional datastore like a relational database as their back end. That means that rather than writing directly to whatever datastore you're familiar with, you have to rely on the JMS implementation's extra layer between you and that stored messages.
While swapping JMS implementations to find the one that perfectly fits your needs may seem like a simple endeavor because of the homogeneous JMS API, the critical features for failure handling, JMS server monitoring, and all the other cool stuff that exists above and beyond messaging is going to be a hassle to deal with if you do change your implementation.
That said, I think you'd be crazy to write to the DB yourself instead of going with JMS. On the first point, ActiveMQ is a venerable JMS server used in many enterprise environments. On the second point, the fact is you'd just end up writing that extra layer yourself in order to implement messaging, and your code won't have the benefit of thousands of eyes (or a set of paid developers who's sole job it is to respond to customers and make sure the JMS implementation is solid). On the third point, well the same ends up being true of your backend datastore. Use JMS, you'll save yourself trouble in the long run.
If you want to go the JMS route, a standalone JMS-compatible message broker (separate from your app server) would be a good choice. Message brokers range from free open-source (like ActiveMQ at http://activemq.apache.org/ or OpenMQ at https://mq.dev.java.net/), to large-scale commercial solutions (IBM's WebSphere MQ at http://www-01.ibm.com/software/integration/wmq/ is one of the largest).
Message brokers offer guaranteed delivery (provided the server's up and listening), and you can do quite a bit to ensure that the system is fail-safe including integrated backup broker servers and instant power backup. Broker queues can eventually run out of room if your app server isn't picking up the messages, but you can assign huge queue depth (100's of GB) and have the server send alerts if the messages aren't getting processed and the queue reaches a certain percentage.
Your Java app would then run on a different server entirely, and would connect to the broker and pull messages off of the queue as fast as possible. If the app server crashes or stops picking up messages for any other reason, the broker would just keep all messages in that queue until the app server begins picking them up again.
You will be wanting to implement a poison message queue in your implementation - this is the place that messages unable to be processed after some number of retries will arrive.
You will probably need to write some code that can examine the messages in that queue and re-send them to the appropriate destination after fixing whatever is causing them to fail.
If sequence of message processing is important, a message ending up in the poison queue could mean all processing is halted until that message is corrected.
As far as fault tolerance goes, you can have multiple instances of the consuming services subscribe to the same queue or topic, providing an ability to continue processing even if one or more instances goes down.
Finally, have a watchdog process that pings the various consumers on your message queue, and if one doesn't respond, have it send a message that results in a new instance being started. In this way, your message processing environment can be somewhat self regulating.

Categories

Resources