Updated:
We receive measurement values from different measurement stations, and put them for processing into a jms queue.
Now we use jms grouping, where the groupId is the id of the station, to ensure that the values from one station are processed serially.
Just see the stock sample in the docs: http://docs.jboss.org/hornetq/2.2.2.Final/user-manual/en/html/message-grouping.html
But in HornetQ (or jms at all), each group is pinned to one specific worker. This means if group A and group B are pinned to Worker X and there are 10 Messages for group A and 10 messages for group B in the queue, the group B messages have to wait until the group A messages are handled. (Although there are enough free workers, that can handle the group B messages right now)
Is there a way, to tell jms, not to pin each group to one specific worker, but only to ensure the serial processomg within each group.
JMS queues do prevent parallel processing. Have you confirmed that the different consumers process the same message? Does each message have a unique ID and are you logging the ID of each message processed?
From the example you gave it sounds like you are mis-using JMS queues. Could you update your question with a clearer, more in-depth example of your workflow?
Right now with the way your question is worded, it sounds like you need to re-design your workflow model
Related
We have a ActiveMQ queue which will receive 100k stock order messages(each message contains stock name, sell price, bid price in json format) per second.
Out of 100k messages/sec there can be n no.of messages of single stock. If we receive multiple messages of same stock then we need to process all those messages in the same order using java.
We can't process 100k messages/second using single listener in one server.
Need to process it by using multiple listeners & servers but display the result in UI using the same order that is placed in Queue.
Read Stock Queue--> Validate the request -->Update the Stock price in UI
Example message:-
{
stockName:"TCS",
sellPrice:"102",
bidPrice:"100"
}
Can you suggest solution for the above problem.
Here is my proposal:
You need to split the queue to sub queues based on the stock name. you can split based on the first letter(s) of the stock name. this will give you ample parallel capabilities while ensuring that all messages of the same stock land on one queue.
there will need to be one reader from the main queue but all it does is forwarding the messages to the sub queues.
I would suggest to use non-persistent publication to topics instead of queues. Topics give you the flexibility of
choosing subscription wildcards and
possibly adding other services later to this architecture. You might not require this now, but maybe in 5 years you will need another GUI or some form of monitoring or replay service. In case you used topics you can just plugin new subscribers - you don't have to change your publication side for those...
You can use durable subscriptions if you need more persistence.
Message order is guaranteed within the same publication topic so you should make the stock name part of the topic. You could publish on something like ORDER.STOCK.TCS.
But having a balanced load based on stock names is tricky because some letters like Z are very rare, while others are frequent. So in addition to the stock name add the stock name's hash%100 to the topic. For Example if the hashcode of TCS is 12357 and you do modulo 100, you publish this on ORDER.STOCK.TCS.57
Let's say you have 10 subscriber's, each subscriber could then make 10 subscriptions. For example subscriber 1 would subscribe to ORDER.STOCK.*.0, ORDER.STOCK.*.1, ... ORDER.STOCK.*.9
Subscriber 2 would subscribe to ORDER.STOCK.*.10, ORDER.STOCK.*.11, ... ORDER.STOCK.*.19
If you have 5 subscribers, each one does 20 subscriptions (you get the idea).
The reason for this is that
We had a similar requirement and we used an open source framework called LMAX Disruptor, supposedly highly performant concurrency framework. You can experiment around it, https://github.com/LMAX-Exchange/disruptor/wiki/Getting-Started.
On a very high level:
Put the Stocks received into a ringbuffer [core data structure that
the framework is built upon], this would be the consumer for
ActiveMQ and producer for the ringbuffer.
The consumers/workers[in your case multiple - mulltiple here is a
worker-thread for each unique stock-name] pick up the Stocks from
ringbuffer in ordered fashion. In the worker/listener, you can
handle event based on condition.
I've just committed sample code trying to demonstrate your use case, for your reference:
https://github.com/reddy73/Disruptor-Example
We're using ActiveMQ (5.14.5).
We have a single producer, and multiple consumers on the same queue.
From time to time we set JMSXGroupID to group several messages together to be consumed on a single consumer. This works as expected.
In parallel, the producer continues to send non-grouped messages (i.e. without JMSXGroupID)
The problem:
We noticed that once a consumer was selected to process a specific group, it no longer gets the non-grouped messages. Even if it is completely idle. The non-grouped messages are always sent to the other consumers.
The rogue consumer returns to consume non-grouped messages only after we close the group that was assigned to it (by setting JMSXGroupSeq=-1).
Is this a normal behavior? We expected that non-grouped messages will continue to be delivered in the same round-robin fashion as usual, to all consumers.
We were unable to find a clear reference to this in ActiveMQ documentation.
There's a bit of a no-win situation for the message broker here. If there are active message groups in play, the the broker has to assume that further messages will be produced that fall into those groups. So a message consumer that has become bound to a particular group needs to remain available to consumer later messages of that group, rather than ungrouped messages. After all, an ungrouped message can be handled elsewhere, while a grouped message can't.
However, we also want to have a fair-ish distribution of messages between consumers. So it makes sense that a consumer that is bound to a group, or groups, could take some work when it is idle.
But how do we know it is idle? What happens if a consumer takes a bunch of ungrouped messages (and don't forget the default pre-fetch behaviour), and then new messages arrive that match its specific group?
The fact that closing a group restores the "group consumer" to default behaviour suggests to me that this is not a bug, but a deliberate attempt to make a reasonable compromise in a tricky situation. It seems reasonable to me to ask for a feature to be added, where "group consumers" can take part in ungrouped workload, but I would be inclined to see that as an enhancement.
Just my $0.02, of course.
In my system, there are many users who write the blogs. I need to subscribe to different users. There is no centralized system(it's a swing application).
I am using JMS.
The user may follow one user, two users or 100 users.
m_destination1 = m_session.createQueue("USER.DEVID");
m_consumer1 = m_session.createConsumer(m_destination1);
m_destination2 = m_session.createQueue("USER.HARRY");
m_consumer2 = m_session.createConsumer(m_destination2);
Is there any generic way to write the above lines of code for unknown no. of users ? Like one consumer can receive message from many users.
Here wildcard will not work.
The best thing you can use is Mirrored Queue feature of activeMQ,
you can read the documentation here
http://activemq.apache.org/mirrored-queues.html
What mirrored queue basically does is,it forwards all the messages sent on queue to a similar named topic, this topic can then be subscribed by multiple consumers.
If you use mirrored queue, you will need your consumers to subscribe to different topics.
Your design cries out for publish-subscribe(topic) domain rather than a point-to-point architecture(i.e queue).As you already would be having an architecture which generates a queue for different people writing blogs,change to that system wont be required but your requirement will be catered.
I addition to this,if 2 consumers listen on a queue then they will pick up messages parallely from queue i.e If there are 2 messages on queue then both consumers will process 1 message independently,I don't think that's what you want.
Hope this helps!
Good luck!
#Vihar's answer is right that you should be using the publish-subscribe paradigm by using a topic, to allow multiple consumers to both be notified of new blog posts. It sounds like your primary pain point is that you've got one destination per author and users that want to consume messages have to subscribe to each of them individually.
Instead, have all new-post messages published into a single topic (let's call it NewPostNotificationTopic). Clients can then subscribe to all messages but immediately check them against the list of authors they care about and immediately stop processing any notification for an author they're not following. (This puts the filtering into the message handler rather than into the ActiveMQ network.) This does mean that each message will be passed to each client, but as long as the messages are small and your network is fast and your users are usually connected to the network, this might be a workable solution. But if you can't afford the network bandwidth of sending all messages to all clients, or if your consumers will be offline for long periods of time and you can't afford to hold a copy of all messages till they come back online, this may not work for you.
Alternatively, publish all messages into that same topic, but set the author's ID as a header on the message and use message selectors to tell ActiveMQ to only deliver messages matching a given author ID. This will be more efficient, but you're back to needing to explicitly tell ActiveMQ which authors you care about, either with a single subscription with a selector that contains ORs or with one subscription per author. The latter is cleaner but gets you back to your problem of one subscription per author per reader; the former results in only one subscription but it has to be updated each time you add/remove an author for a reader, and you'll need to make sure you handle the race conditions inherent in removing the subscription and adding another one. I'd go with the first solution I proposed (doing the filtering in the message handler instead of in the ActiveMQ subscriptions) if the performance concerns I raised there aren't a problem; otherwise I'd probably go with one subscription per author per reader, rather than having a single subscription with an ORed selector and needing to redo the subscription each time something changed.
ActiveMQ Message Groups are a wonderful feature for load balancing across multiple consumers. In short: a stream of messages is partitioned across multiple consumers of a single queue according to a group identifier embedded in the message (JMSXGroupID). (So, consumer 1 will get all messages with JMSXGroupID = a, consumer 2 will get all messages with JMSXGroupID = b, and so on.)
Now, imagine you have 2 queues: A and B, and imagine that a consistent taxonomy of JMSXGroupIDs is used in messages flowing through both queues. Will the consumer the broker chooses for JMSXGroupID = ABC on queue A be the consumer from the same connection that the broker chooses for JMSXGroupID = ABC on queue B?
I suspect the answer to the question as I've asked it is "no." There are too many variables in play: What happens if the consumer the broker chooses for A has no corresponding consumer for B? What happens if the consumer the broker chooses for A has multiple corresponding consumers for B? There's no obvious right answer in these cases.
However, can we simulate this behavior? For example, a consumer on a composite destination could be a viable solution -- make sure all consumers on A and B consume on the composite destination A,B and you might be in business -- but ActiveMQ does not appear to support consuming from composite destinations.
The only solution I've found is simply to push messages for both A and B on one single queue -- call it AB -- and have an exclusive consumer on that. You now have to distinguish between "A messages" and "B messages," but you could easily do that with headers.
However, this solution smells funny. (You now have to assume producers will dutifully apply special headers to their messages, or modify your payloads.) Is there a solution that will make sure consumers across two separate queues A and B always land on the same connection?
As you have correctly worked out, message groups apply to a single queue only. There is no coordination between multiple queues.
Generally when you are using message groups you are trying to guarantee message ordering across not just delivery, but processing - so that say all events for a particular entity are processed in sequence. The devil is always in the details of your use case, but putting all of the related messages onto a single queue will give you the result that you are after. In order to process them differently, you then need to put some sort of multiplexing logic into your consumer to make the decision based on the message payload - a well known header as you say is a good candidate for a solution.
To get around the prerequisite of ensuring that clients explicitly set this, what you can do is write a piece of Camel routing logic that does this on your behalf - this is only possible with the broker: component that was added to ActiveMQ 5.9. The idea will be that producers see two separate queues - A & B; the routing logic will read from those queues as the messages are being put in, set the header appropriately, and re-route them to C instead. The routing logic in effect works as an interceptor.
<route id="ConflateA">
<from uri="broker:queue:A"/>
<setHeader headerName="OriginalMessageSource">
<constant>A</constant>
</setHeader>
<to uri="broker:queue:C"/>
</route>
You can then use the OriginalMessageSource header in your multiplexing logic.
I have a JMS Queue that is populated at a very high rate ( > 100,000/sec ).
It can happen that there can be multiple messages pertaining to the same entity every second as well. ( several updates to entity , with each update as a different message. )
On the other end, I have one consumer that processes this message and sends it to other applications.
Now, the whole set up is slowing down since the consumer is not able to cope up the rate of incoming messages.
Since, there is an SLA on the rate at which consumer processes messages, I have been toying with the idea of having multiple consumers acting in parallel to speed up the process.
So, what Im thinking to do is
Multiple consumers acting independently on the queue.
Each consumer is free to grab any message.
After grabbing a message, make sure its the latest version of the entity. For this, part, I can check with the application that processes this entity.
if its not latest, bump the version up and try again.
I have been looking up the Integration patterns, JMS docs so far without success.
I would welcome ideas to tackle this problem in a more elegant way along with any known APIs, patterns in Java world.
ActiveMQ solves this problem with a concept called "Message Groups". While it's not part of the JMS standard, several JMS-related products work similarly. The basic idea is that you assign each message to a "group" which indicates messages that are related and have to be processed in order. Then you set it up so that each group is delivered only to one consumer. Thus you get load balancing between groups but guarantee in-order delivery within a group.
Most EIP frameworks and ESB's have customizable resequencers. If the amount of entities is not too large you can have a queue per entity and resequence at the beginning.
For those ones interested in a way to solve this:
Use Recipient List EAI pattern
As the question is about JMS, we can take a look into an example from Apache Camel website.
This approach is different from other patterns like CBR and Selective Consumer because the consumer is not aware of what message it should process.
Let me put this on a real world example:
We have an Order Management System (OMS) which sends off Orders to be processed by the ERP. The Order then goes through 6 steps, and each of those steps publishes an event on the Order_queue, informing the new Order's status. Nothing special here.
The OMS consumes the events from that queue, but MUST process the events of each Order in the very same sequence they were published. The rate of messages published per minute is much greater than the consumer's throughput, hence the delay increases over time.
The solution requirements:
Consume in parallel, including as many consumers as needed to keep queue size in a reasonable amount.
Guarantee that events for each Order are processed in the same publish order.
The implementation:
On the OMS side
The OMS process responsible for sending Orders to the ERP, determines the consumer that will process all events of a certain Order and sends the Recipient name along with the Order.
How this process know what should be the Recipient? Well, you can use different approaches, but we used a very simple one: Round Robin.
On ERP
As it keeps the Recipient's name for each Order, it simply setup the message to be delivered to the desired Recipient.
On OMS Consumer
We've deployed 4 instances, each one using a different Recipient name and concurrently processing messages.
One could say that we created another bottleneck: the database. But it is not true, since there is no concurrency on the order line.
One drawback is that the OMS process which sends the Orders to the ERP must keep knowledge about how many Recipients are working.