I have an MDB that gets subscribed to a topic which sends messages whose content is eventually persisted to a DB.
I know MDBs are pooled, and therefore the container is able to handle more than one incoming message in parallel. In my case, the order in which those messages are consumed (and then persisted) is important. I don't want an MDB instance pool to consume and then persist messages in a different order as they get published in the JMS topic.
Can this be an issue? If so, is there a way of telling the container to follow strict incoming order when consuming messages?
Copied from there:
To ensure that receipt order matches the order in which the client sent the message, you must do the following:
Set max-beans-in-free-pool to 1 for the MDB. This ensures that the MDB is the sole consumer of the message.
If your MDBs are deployed on a cluster, deploy them to a single node in the cluster, [...].
To ensure message ordering in the event of transaction rollback and recovery, configure a custom connection factory with MessagesMaximum set to 1, and ensure that no redelivery delay is configured. For more information see [...].
You should be able to limit the size of the MDB pool to 1, thus ensuring that the messages are processed in the correct order.
Of course, if you still want some parallelism in there, then you have a couple of options, but that really depends on the data.
If certain messages have something in common and the order of processing only matters within that group of messages that share a common value, then you may need to have multiple topics, or use Queues and a threadpool.
If on the other hand certain parts of the logic associated with the arrival of a message can take place in parallel, while other bits cannot, then you will need to split the logic up into the parallel-ok and parallel-not-ok parts, and process those bits accordingly.
Related
As mentioned in the answer,
A message queue is a one-way pipe: one process writes to the queue, and another reads the data in the order
SysV message queue is one example
So, my understanding is,
one message queue is used by two processes, where one process(producer) insert an item in the queue and another process(consumer) consumes the item from the queue
1) Is RabbitMQ or Kafka message queue a 1:1 messaging system? used by only two processes, where one process writes and other process reads......
2) after the consumer consume the item, does the item get deleted? If no, why do we need queue data structure? Why not just shared memory?
Kafka is not strictly 1:1 messaging system. Multiple producers can write into a topic and multiple consumers can read from it. Moreover, in Kafka, multiple consumers can be assigned same or different consumer groups. Every message is consumed by only one consumer from every consumer group (load balancing) and all consumer groups receive a copy of every message (of course, if they are subscribed to corresponding topics and no messages are lost). A good description of this process can be found in this article: Scalability of Kafka Messaging using Consumer Groups.
In Kafka all messages are persisted on the disk and stored until the compaction reaps it, or the retention.ms passes, or the log size is exceeded. That's a very high-level point of view and there are a lot of nuances here. Like: the messages are stored in segments, every segment contains multiple messages. When the retention period passes for a message, it is not removed from the segment at that moment, instead Kafka waits until all messages in that segment are expired and delete the whole segment at once. Also, retention could come before the log exceeds the maximum size or vice versa: the log can exceed the size even before the retention period passes. And so on. Just read the docs and pay attention to topics about "log cleaner" and "retention".
After the Kafka consumer reads the message it is neither compacted, nor expired. So, it's not removed from the log and stays there. It also means that every message could be re-read by a consumer if needed (until it is deleted completely). It can be useful if some of your consumers went offline for some reason and were not able to process the messages as they come in. It also allows interesting features like transaction replays and so on. Persistence is one of the Kafka's features.
Shared memory? Well, strictly speaking shared memory is only allowed inside a single process. So you can't generally use "shared memory" when you need to access it from different processes. And there is absolutely no way to have "shared memory" when you app runs on multiple hosts. However, there are in-memory brokers. Like Redis can be used as a message broker, and it's all in-memory. However, if such a broker restarts for some reason you lose everything. Speaking about Redis: it has two persistence configurations specifically to handle the restarts.
I am not sure about RabbitMQ, but it probably deletes messages after the consumer acknowledged them by default. So it's closer to 1:1 mental model. However, RabbitMQ employs disk persistence as well.
We're using ActiveMQ (5.14.5).
We have a single producer, and multiple consumers on the same queue.
From time to time we set JMSXGroupID to group several messages together to be consumed on a single consumer. This works as expected.
In parallel, the producer continues to send non-grouped messages (i.e. without JMSXGroupID)
The problem:
We noticed that once a consumer was selected to process a specific group, it no longer gets the non-grouped messages. Even if it is completely idle. The non-grouped messages are always sent to the other consumers.
The rogue consumer returns to consume non-grouped messages only after we close the group that was assigned to it (by setting JMSXGroupSeq=-1).
Is this a normal behavior? We expected that non-grouped messages will continue to be delivered in the same round-robin fashion as usual, to all consumers.
We were unable to find a clear reference to this in ActiveMQ documentation.
There's a bit of a no-win situation for the message broker here. If there are active message groups in play, the the broker has to assume that further messages will be produced that fall into those groups. So a message consumer that has become bound to a particular group needs to remain available to consumer later messages of that group, rather than ungrouped messages. After all, an ungrouped message can be handled elsewhere, while a grouped message can't.
However, we also want to have a fair-ish distribution of messages between consumers. So it makes sense that a consumer that is bound to a group, or groups, could take some work when it is idle.
But how do we know it is idle? What happens if a consumer takes a bunch of ungrouped messages (and don't forget the default pre-fetch behaviour), and then new messages arrive that match its specific group?
The fact that closing a group restores the "group consumer" to default behaviour suggests to me that this is not a bug, but a deliberate attempt to make a reasonable compromise in a tricky situation. It seems reasonable to me to ask for a feature to be added, where "group consumers" can take part in ungrouped workload, but I would be inclined to see that as an enhancement.
Just my $0.02, of course.
A Java EE application gets data delivered by a JMS queue. Unfortunately, some of the messages which are delivered depend on each other, so they have to be processed in the correct order. As far i understand it, i can't rely on JMS here (or can i? i know that they will be sent in the correct order).
Additionally, it may be the case that there will be one kind of message that will be split by the provider of the messages, so that in some cases thousands of these messages will be sent to the application, all of them related to a specific entity. It would not be necessary to handle them one by one (which would be the easiest way, e.g. in a MDB) and i fear it will be a bad performance if i handle them this way, because i always have to save some information in the database, so i'd prefer to batch them in some way and handle them altogether in one transaction. But i don't know how to do this, since i didn't detect a way to batch messages while reading from a queue. I could get all messages out of a queue every second or so and process them, but then i don't know for sure if messages that depend on each other are already in there.
So i think i need some kind of buffering/caching or sth. like this between receiving and processing the JMS messages, but i don't see a good approach at the moment.
Any ideas how to handle this scenario?
Section 4.4.10 of the JMS spec states that:
JMS defines that messages sent by a session to a destination must be received
in the order in which they were sent (see Section 4.4.10.2 ”Order of Message
Sends,” for a few qualifications). This defines a partial ordering constraint on a
session’s input message stream
(The qualifications are mainly about QoS and non-persistent messages).
The spec continues:
JMS does not define order of message receipt across destinations or across a
destination’s messages sent from multiple sessions. This aspect of a session’s
input message stream order is timing-dependent. It is not under application
control.
So, relying on ordering in your JMS client really comes down to the session scope of your producer and the number of producers.
You could also consider grouping the related messages in a transaction to ensure atomicity of whatever operation you're performing.
Hope that helps a bit.
EDIT: I assume you know "related messages"/dependencies by using some sort of correlationID? I'm getting the idea that your problem is very similar to eg. TCP, so perhaps you could use the algorithms from that area to ensure ordered delivery of your messages?
Cheers,
A JMS Queue will definitely process them in the correct order. It is, after all, a Queue (and you can't cut in line).
I have a JMS Queue that is populated at a very high rate ( > 100,000/sec ).
It can happen that there can be multiple messages pertaining to the same entity every second as well. ( several updates to entity , with each update as a different message. )
On the other end, I have one consumer that processes this message and sends it to other applications.
Now, the whole set up is slowing down since the consumer is not able to cope up the rate of incoming messages.
Since, there is an SLA on the rate at which consumer processes messages, I have been toying with the idea of having multiple consumers acting in parallel to speed up the process.
So, what Im thinking to do is
Multiple consumers acting independently on the queue.
Each consumer is free to grab any message.
After grabbing a message, make sure its the latest version of the entity. For this, part, I can check with the application that processes this entity.
if its not latest, bump the version up and try again.
I have been looking up the Integration patterns, JMS docs so far without success.
I would welcome ideas to tackle this problem in a more elegant way along with any known APIs, patterns in Java world.
ActiveMQ solves this problem with a concept called "Message Groups". While it's not part of the JMS standard, several JMS-related products work similarly. The basic idea is that you assign each message to a "group" which indicates messages that are related and have to be processed in order. Then you set it up so that each group is delivered only to one consumer. Thus you get load balancing between groups but guarantee in-order delivery within a group.
Most EIP frameworks and ESB's have customizable resequencers. If the amount of entities is not too large you can have a queue per entity and resequence at the beginning.
For those ones interested in a way to solve this:
Use Recipient List EAI pattern
As the question is about JMS, we can take a look into an example from Apache Camel website.
This approach is different from other patterns like CBR and Selective Consumer because the consumer is not aware of what message it should process.
Let me put this on a real world example:
We have an Order Management System (OMS) which sends off Orders to be processed by the ERP. The Order then goes through 6 steps, and each of those steps publishes an event on the Order_queue, informing the new Order's status. Nothing special here.
The OMS consumes the events from that queue, but MUST process the events of each Order in the very same sequence they were published. The rate of messages published per minute is much greater than the consumer's throughput, hence the delay increases over time.
The solution requirements:
Consume in parallel, including as many consumers as needed to keep queue size in a reasonable amount.
Guarantee that events for each Order are processed in the same publish order.
The implementation:
On the OMS side
The OMS process responsible for sending Orders to the ERP, determines the consumer that will process all events of a certain Order and sends the Recipient name along with the Order.
How this process know what should be the Recipient? Well, you can use different approaches, but we used a very simple one: Round Robin.
On ERP
As it keeps the Recipient's name for each Order, it simply setup the message to be delivered to the desired Recipient.
On OMS Consumer
We've deployed 4 instances, each one using a different Recipient name and concurrently processing messages.
One could say that we created another bottleneck: the database. But it is not true, since there is no concurrency on the order line.
One drawback is that the OMS process which sends the Orders to the ERP must keep knowledge about how many Recipients are working.
There is one controlling entity and several 'worker' entities. The controlling entity requests certain data from the worker entities, which they will fetch and return in their own manner.
Since the controlling entity can agnostic about the worker entities (and the working entities can be added/removed at any point), putting a JMS provider in between them sounds like a good idea. That's the assumption at least.
Since it is an one-to-many relation (controller -> workers), a JMS Topic would be the right solution. But, since the controlling entity is depending on the return values of the workers, request/reply functionality would be nice as well (somewhere, I read about the TopicRequester but I cannot seem to find a working example). Request/reply is typical Queue functionality.
As an attempt to use topics in a request/reply sort-of-way, I created two JMS topis: request and response. The controller publishes to the request topic and is subscribed to the response topic. Every worker is subscribed to the request topic and publishes to the response topic. To match requests and responses the controller will subscribe for each request to the response topic with a filter (using a session id as the value). The messages workers publish to the response topic have the session id associated with them.
Now this does not feel like a solution (rather it uses JMS as a hammer and treats the problem (and some more) as a nail). Is JMS in this situation a solution at all? Or are there other solutions I'm overlooking?
Your approach sort of makes sense to me. I think a messaging system could work. I think using topics are wrong. Take a look at the wiki page for Enterprise Service Bus. It's a little more complicated than you need, but the basic idea for your use case, is that you have a worker that is capable of reading from one queue, doing some processing and adding the processed data back to another queue.
The problem with a topic is that all workers will get the message at the same time and they will all work on it independently. It sounds like you only want one worker at a time working on each request. I think you have it as a topic so different types of workers can also listen to the same queue and only respond to certain requests. For that, you are better off just creating a new queue for each type of work. You could potentially have them in pairs, so you have a work_a_request queue and work_a_response queue. Or if your controller is capable of figuring out the type of response from the data, they can all write to a single response queue.
If you haven't chosen an Message Queue vendor yet, I would recommend RabbitMQ as it's easy to set-up, easy to add new queues (especially dynamically) and has really good spring support (although most major messaging systems have spring support and you may not even be using spring).
I'm also not sure what you are accomplishing the filters. If you ensure the messages to the workers contain all the information needed to do the work and the response messages back contain all the information your controller needs to finish the processing, I don't think you need them.
I would simply use two JMS queues.
The first one is the one that all of the requests go on. The workers will listen to the queue, and process them in their own time, in their own way.
Once complete, they will put bundle the request with the response and put that on another queue for the final process to handle. This way there's no need for the the submitting process to retain the requests, they just follow along with the entire procedure. A final process will listen to the second queue, and handle the request/response pairs appropriately.
If there's no need for the message to be reliable, or if there's no need for the actual processes to span JVMs or machines, then this can all be done with a single process and standard java threading (such as BlockingQueues and ExecutorServices).
If there's a need to accumulate related responses, then you'll need to capture whatever grouping data is necessary and have the Queue 2 listening process accumulate results. Or you can persist the results in a database.
For example, if you know your working set has five elements, you can queue up the requests with that information (1 of 5, 2 of 5, etc.). As each one finishes, the final process can update the database, counting elements. When it sees all of the pieces have been completed (in any order), it marks the result as complete. Later you would have some audit process scan for incomplete jobs that have not finished within some time (perhaps one of the messages erred out), so you can handle them better. Or the original processors can write the request to a separate "this one went bad" queue for mitigation and resubmission.
If you use JMS with transaction, if one of the processors fails, the transaction will roll back and the message will be retained on the queue for processing by one of the surviving processors, so that's another advantage of JMS.
The trick with this kind of processing is to try and push the state with message, or externalize it and send references to the state, thus making each component effectively stateless. This aids scaling and reliability since any component can fail (besides catastrophic JMS failure, naturally), and just pick up where you left off when you get the problem resolved an get them restarted.
If you're in a request/response mode (such as a servlet needing to respond), you can use Servlet 3.0 Async servlets to easily put things on hold, or you can put a local object on a internal map, keyed with the something such as the Session ID, then you Object.wait() in that key. Then, your Queue 2 listener will get the response, finalize the processing, and then use the Session ID (sent with message and retained through out the pipeline) to look up
the object that you're waiting on, then it can simply Object.notify() it to tell the servlet to continue.
Yes, this sticks a thread in the servlet container while waiting, that's why the new async stuff is better, but you work with the hand you're dealt. You can also add a timeout to the Object.wait(), if it times out, the processing took to long so you can gracefully alert the client.
This basically frees you from filters and such, and reply queues, etc. It's pretty simple to set it all up.
Well actual answer should depend upon whether your worker entities are external parties, physical located outside network, time expected for worker entity to finish their work etc..but problem you are trying to solve is one-to-many communication...u added jms protocol in your system just because you want all entities to be able to talk in jms protocol or asynchronous is reason...former reason does not make sense...if it is latter reason, you can choose other communication protocol like one-way web service call.
You can use latest java concurrent APIs to create multi-threaded asynchronous one-way web service call to different worker entities...