I'm trying to understand the best way to coalesce or chunk incoming messages in RabbitMQ (using Spring AMQP or the Java client directly).
In other words I would like to take say 100 incoming messages and combine them as 1 and resend it to another queue in a reliable (correctly ACKed way). I believe this is called the aggregator pattern in EIP.
I know Spring Integration provides an aggregator solution but the implementation looks like its not fail safe (that is it looks like it has to ack and consume messages to build the coalesced message thus if you shutdown it down while its doing this you will loose messages?).
I can't comment directly on the Spring Integration library, so I'll speak generally in terms of RabbitMQ.
If you're not 100% convinced by the Spring Integration implementation of the Aggregator and are going to try to implement it yourself then I would recommend avoiding using tx which uses transactions under the hood in RabbitMQ.
Transactions in RabbitMQ are slow and you will definitely suffer performance problems if you're building a high traffic/throughput system.
Rather I would suggest you take a look at Publisher Confirms which is an extension to AMQP implemented in RabbitMQ. Here is an introduction to it when it was new http://www.rabbitmq.com/blog/2011/02/10/introducing-publisher-confirms/.
You will need to tweak the prefetch setting to get the performance right, take a look at http://www.rabbitmq.com/blog/2012/05/11/some-queuing-theory-throughput-latency-and-bandwidth/ for some details.
All the above gives you some background to help solve your problem. The implementation is rather straightforward.
When creating your consumer you will need to ensure you set it so that ACK is required.
Dequeue n messages, as you dequeue you will need to make note of the DeliveryTag for each message (this is used to ACK the message)
Aggregate the messages into a new message
Publish the new message
ACK each dequeued message
One thing to note is that if your consumer dies after 3 and before 4 has completed then those messages that weren't ACK'd will be reprocessed when it comes back to life
If you set the <amqp-inbound-channel-adapter/> tx-size attribute to 100, the container will ack every 100 messages so this should prevent message loss.
However, you might want to make the send of the aggregated message (on the 100th receive) transactional so you can confirm the broker has the message before the ack for the inbound messages.
Related
I am building a high volume system that will be processing up to a hundred million messages everyday. I have a microservice that is reading from a Kafka topic and doing some basic processing on them before forwarding them to the next microservice.
Kafka Topic -> Normalizer Microservice -> Ordering Microservice
Below is what the processing would look like:
Normalizer would be concurrently picking up messages from the Kafka topic.
Normalizer would read the messages from the topic and post them to an in-memory seda queue from where the message would be subsequently picked up, normalized and validated.
This normalization, validation and processing is expected to take around 1 second per message. Within this one second, the message will be stored to the database and will become persistent in the system.
My concern is that during this processing, if a message has been already read from the topic and posted to the seda queue and has either
not yet been picked up from the seda queue or,
has been picked up from the seda queue and is currently processing and has not yet been persisted to the database
and the Normalizer JVM crashes or is force-killed (kill -9), how do I ensure that I do NOT lose the message?
It is critical that I do NOT drop/lose any messages and even in case of a crash/failure, I should be able to retain the message such that I can trigger re-processing of that message if required.
One naïve approach that comes to mind is to push the message to a cache (which will be a very fast operation).
Read from topic -> Push to cache -> Push to seda queue
Needless to say, the problem still exists, it just makes it less probable that I will lose the message. Also, this is certainly not the smartest solution out there.
Please share your thoughts on how I can design this system such that I can preserve messages on my side once the messages have been read off of the Kafka topic even in the event of the Normalizer JVM crashing.
According to Apache Camel documentation, "Camel supports the Guaranteed Delivery from the EIP patterns using among others the following components: ... JMS."
I'm trying to understand if this means I can use JMS in the middle of a multi-component route to "guarantee delivery."
For example, I have some routes that looks like this:
from("rest://post:someRestRoute")
// blah blah
.to("jms:queue:someQueue");
from("jms:queue:someQueue")
// blah blah
.to("spring-ws:someAddress")
.to("someOtherRoute");
Does using JMS in the middle of a multi-component route have any benefits? Camel is writing to and reading from the queue, and the queue is running on the same computer and same JVM, so Camel is only guaranteeing delivery to itself, which seems redundant.
For example,
A message is POSTed to someRestRoute.
The message is queued and persisted on someQueue.
The message is immediately dequeued.
The message is sent to a webservice at someAddress.
As I understand it, as far as the JMS broker is concerned the message is "delivered" the moment it's successfully dequeued; it doesn't matter if spring-ws:someAddress throws an exception. I suppose this might be helpful if Camel crashed immediately after step 2, but I was hoping to guarantee delivery to someAddress.
Does using JMS in the middle of a multi-component route have any benefits? Can it be used to "guarantee delivery" to someAddress in the example?
Only if your JMS queue is defined to persist messages. Then in case your route/application/server stopped before picking message, it will be in the queue until processed next time when route starts.
If you do not need persistence then there is no reason to have JMS.
Camel can be set to retry delivery to the endpoint in case of failure.
But, if you need persistence JMS is the best (if not only one) way to do it.
I'm mostly using Kafka for traditional messaging but I'd also like the ability to consume small topics in a batch fashion, i.e. connect to a topic, consume all the messages and immediately disconnect (not block waiting for new messages). All my topics have a single partition (though they are replicated across a cluster) and I'd like to use the high-level consumer if possible. It's not clear from the docs how I could accomplish such a thing in Scala (or Java). Any advice gratefully received.
The consumer.timeout.ms setting will throw a timeout exception after the specified time if no message is consumed before and this is the only option you have with the high level consumer afaik. Using this you could set it to something like 1 second and disconnect after that if it's an acceptable solution.
If not, you'd have to use the simple consumer and check message offsets.
I am begining to implement an ActiveMQ based messaging service to send worker tasks to various servers, however I am noticing that in the default mode, if no one is "listening" to a producer's topic, any message from that producer will be lost.
I.e.,
If Producer Senders Message with a live broker
But No Consumer is there to listen
Message goes no where
I would like instead for the Broker to hold on to messages until at least one listener receives it.
I am trying a couple ways of implementing this, but not sure on the most optimal/right way way:
Implement a Message Acknowledgement feature
(Caveat to this is I need the producer to wait on its listener after every message which seems very, very clunky and last resort...)
Implement the Session Transaction
(I am having trouble with this one, it sounds like the right thing to use here because of the word transaction, but I think it has more to do with the producer-broker interaction, not the producer-consumer)
Ideally, there is a mode to send a (or a set of) messages, and after sending a Boolean is returned stating if the message(s) were listened by at least one consumer.
Transactions and acknowlegdement conflict somehow with the general idea of a JMS topic.
Just use a queue instead of a topic. Access this queue using CLIENT_ACKNOWLEDGE or a transacted session. A worker task is to be processed by one worker only anyway, so the queue solves another problem.
If there was a special reason to use topics, you could consider a message driven bean (MDB) on the same host like the JMS provider (you could achieve this by using JBoss with its integrated HornetQ for example), but this is still not really correct.
Another possibility is to have both a topic and a queue. The latter is only for guaranteed delivery of each message.
This isn't really a typical messaging pattern. Typically, you have one receiver and a durable queue or multiple receivers with durable subscriptions to a topic. in either situation, each receiver will always receive the message. i don't really understand a use case where "at least one" receiver should receive it.
and yes, transactions only deal with the interactions between client and broker, not between client and eventual receiver(s).
I am trying to find an answer on of how to notify an EMS Publisher in case of a Subscriber failure.
In a case of Publisher->EMS server->Subscriber, if a Subscriber fails, I need to inform Publisher to take a corrective action.I am not bothered about durabilty/PERSIETENCE, my significance is of time. In Trading systems, If I send an market order to a Subscriber who in turn sends it to an exchange, if it fails, I need to make my Publisher publish the messages on a different topic to another Sunscriber(another exchange).
Any ideas is appreciated.
The tibjmsadmin.jar library contains methods to detect when subscribers disconnect. Easier than writing code, you can:
if you have Hawk, use the
tibjmsadmin.hma to write a Hawk rule
in the event a subscriber
disconnects, or
listen on the monitor
topic
$sys.monitor.connection.disconnect -
the body of the message tells you
which subscriber disconnected.
However these "monitoring" approaches to failing over the publisher have a significant problem - in the time it takes for you to detect the subscriber failure and redirect the publisher, some messages may get through and get stuck in the defunct queue. You don't wat this happening to any $10M trades!
EMS knows when subscribers are connected or not and you should take advantage of this.
Use a "distributed queue" and there shouldn't be any need to code logic into your application to switch to a new subscriber when it fails. This happens without message loss and maintains the order of messages. It is also good architectural practice to keep load balancing and failover logic out of your code and in the administration setup of your JMS provider.
Basically you setup multiple subscribers to a queue (each exchange represented by a subscriber). The default action will be for EMS to load-balance messages across your subscribers in a round-robin fashion. But you can set the queue to "exclusive" so that messages go to only one subscriber at a time. Then if that active subscriber fails, the messages are forwarded to another subscriber.
See the EMS manual for more details on all these topics.
Not sure if you have access, you could try looking at the ReceiverCount or ConsumerCount in either QueueInfo or TopicInfo - I believe you need the tibjms.admin package. May be you can query this before you publish and then selectively publish? Not sure what the overhead is.
Because of the nature of JMS, AFAIK no transaction states (unless you use a XA transaction - with all of that overhead) or acknowledgements will propagate through the EMS broker. I.e.acks are always between publisher and broker and consumer and broker.
Failing the above, you could try a separate ack topic for which the roles are reversed, but then the failure case is a timeout - I'm not sure this is sensible.
If you don't really care which exchange the order goes to - why not make the topic/queue exclusive and make both consumers attempt to consume - the first one to succeed will process all of the messages - if it dies, then the second one (which could be periodically retrying - may successfully connect).. Alternatively allow both to process orders off the queue - remember a message will only be ever processed by a single consumer...
I really cannot see the advantage of a decoupled messaging bus in your order flow... makes no sense to me..