Being new to Apache Camel, I was recently reviewing its long list of components and stumbled upon their support for SEDA queue components.
The page didn't make much sense to me, so I did a couple of online searches for the term "SEDA queue" and got the wikipedia article here.
After reading that article, I can't tell what the difference is between a SEDA queue and a normal, "ordinary" queue! Both embrace the notion of decoupling systems through the use of asynchronous queues.
From the article, "SEDA" just sounds like an architecture that consists of placing a queue between each component. Is this correct?
But if it's just an architecture, then why is a "SEDA" queue a special Apache Camel component?
SEDA is an acronym that stands for Staged Event Driven Architecture it is designed as a mechanism to regulate the flow between different phases of message processing. The idea is to smooth out the frequency of message output from an overall process so that it matches the input, It allows an enpoint´s consumer threads to offload the work of long-running operations into the background, thereby freeing them up to consume messages from the transport.
When an exchange is passed to a seda: endpoint, it is placed into a BlockingQueue. The list exists within the Camel context, wich means that only those routes that are within the same context can be joined by this type of endpoint. The queue is unbounded by default, although that can be changed by setting the size attribute on the URI of the consumer.
By default, a single thread assigned to the endpoint reads exchanges off the list and processes them through the route. As seen in the proceding example, it is possible to increase the number of concurrenctConsumers to ensure that exchanges are getting processed from that list in a timely fashion.
The SEDA pattern is best suited to processing the InOnly messages, where one route finishes processing and hands off to another to deal with the next phase. It is possible to ask for a response from seda:endpoint by calling it when the message exchange pattern is InOut.
Reference:
Apache Camel Developer´s Cookbook
SEDA queues are just like a regular queue (and as Peter said above, in Camel they have a thread pool associated with them as part of the component). SEDA is an architecture. The SEDA component in Camel uses in-memory queues in your process and are a separate component in order to distinguish them from the other queue component in Apache camel, namely the JMS component.
SEDA offers decoupling of the components within a single camel route. Or for that matter within a single process. . Meaning it helps you make async calls to other components... its an in memory blockingqueue.
On the other hand JMS is used for decoupling of the whole system.. JMS will have an external broker involved.. SEDA willl just create a separate thread from the consumer component
Related
I'm building an application using RabbitMQ/Spring/Spring AMQP and am having trouble handling the way I've laid out my queues.
Essentially I have one queue that every consumer listens to, with each message basically saying "this queue is ready to be processed by a single consumer". The consumer will then listen to the queue indicated in the message, consume all the messages in that queue, and finally delete it when done.
These short lived queues are all created on the fly as data comes in to be processed and cannot be consumed by multiple consumers (whichever gets the message in the 'ready' queue).
I'm having trouble gracefully handling the consumers in this situation. Right now I just create a new DirectMessageListenerContainer each time a consumer gets a message from the 'ready' queue and then stop it once it has gotten all the messages it needs. It seems like this solution isn't ideal. Is there any better way to handle a situation like this with Spring AMQP/RabbitMQ?
You can add/remove queues to/from existing container(s) at runtime; it is more efficient with the direct container (see Choosing a container).
The MessageProperties has the consumerQueue property to tell you which queue the message came from.
I'm currently adding JMS support to a application-server-like framework. The JMS will be implemented by HornetQ (stand-alone broker, hornetq jars on the servers classpath) but there is neither JBoss nor spring nor anything else that would provide MDBs.
The next step is to add a message listener to a xa queue that would allow for parallel processing of incoming messages. Some messages would init long running tasks, so the basic idea is to spawn worker threads from the onMessage method.
On my long journey through the internet I came across this discussion, where one of participants mentioned, that he would not do that but use an extra internal queue for the task: the (single threaded) message listener then would simply grab the messages from the inbound queue and create new messages for an internal queue, where at the other end of that internal queue some worker threads fight for the incoming messages. Inbound messages then would be acknowledge once they're "copied" to the internal queue (which is ok for me).
Unfortunatly they don't say why it would be better to not spawn worker threads from the onMessage method - maybe, because the listener would block if all threads from the pool are busy. So I'm looking for pros and cons for the designs decisions:
Start worker threads from the onMessage method of the message listener
Use an internal queue to "send messages to the worker threads"
Transaction limits aside, whether or not to have multiple threads (or processes) reading from a queue simply comes down to whether or not the message order is important. Obviously if the order is important, then a single thread naturally maintains that order, while multiple threads will provide no such guarentee.
What you will normally find, is that order is important but across a subset of all the messages. In this scenario, if a single thread isn't performant, you need to get those messages off the queue and re-queued in as short a possible time because to preserve the order you'll have to use a single thread reading from the initial queue - hence the use of one or more internal queues. The problem this incurs is that the transaction will be closed before the messages are fully processed and so you need some sort of temporary storage to ensure messages don't get dropped if the process were to fall over before the processing had taken place.
If, as your question suggests, you're not too worried about dropping messages then the java.util.concurrent.BlockingQueue sounds like what you need for the internal queues with a single thread servicing each.
I have a web-app where when the user submits a request, we send a JMS message to a remote service and then wait for the reply. (There are also async requests, and we have various niceties set up for message replay, etc, so we'd prefer to stick with JMS instead of, say, HTTP)
In How should I implement request response with JMS?, ActiveMQ seems to discourage the idea of either temporary queues per request or temporary consumers with selectors on the JMSCorrelationID, due to the overhead involved in spinning them up.
However, if I use pooled consumers for the replies, how do I dispatch from the reply consumer back to the original requesting thread?
I could certainly write my own thread-safe callback-registration/dispatch, but I hate writing code I suspect has has already been written by someone who knows better than I do.
That ActiveMQ page recommends Lingo, which hasn't been updated since 2006, and Camel Spring Remoting, which has been hellbanned by my team for its many gotcha bugs.
Is there a better solution, in the form of a library implementing this pattern, or in the form of a different pattern for simulating synchronous request-reply over JMS?
Related SO question:
Is it a good practice to use JMS Temporary Queue for synchronous use?, which suggests that spinning up a consumer with a selector on the JMSCorrelationID is actually low-overhead, which contradicts what the ActiveMQ documentation says. Who's right?
In a past project we had a similar situation, where a sync WS request was handled with a pair of Async req/res JMS Messages. We were using the Jboss JMS impl at that time and temporary destinations where a big overhead.
We ended up writing a thread-safe dispatcher, leaving the WS waiting until the JMS response came in. We used the CorrelationID to map the response back to the request.
That solution was all home grown, but I've come across a nice blocking map impl that solves the problem of matching a response to a request.
BlockingMap
If your solution is clustered, you need to take care that response messages are dispatched to the right node in the cluster. I don't know ActiveMQ, but I remember JBoss messaging to have some glitches under the hood for their clusterable destinations.
I would still think about using Camel and let it handle the threading, perhaps without spring-remoting but just raw ProducerTemplates.
Camel has some nice documentation about the topic and works very well with ActiveMQ.
http://camel.apache.org/jms#JMS-RequestreplyoverJMS
For your question about spinning up a selector based consumer and the overhead, what the ActiveMQ docs actually states is that it requires a roundtrip to the ActiveMQ broker, which might be on the other side of the globe or on a high delay network. The overhead in this case is the TCP/IP round trip time to the AMQ broker. I would consider this as an option. Have used it muliple times with success.
A colleague suggested a potential solution-- one response queue/consumer per webapp thread, and we can set the return-address to the response queue owned by that particular thread. Since these threads are typically long-lived (and are re-used for subsequent web requests), we only have to suffer the overhead at the time the thread is spawned by the pool.
That said, this whole exercise is making me rethink JMS vs HTTP... :)
I have always used CorrelationID for request / response and never suffered any performance issues. I can't imagine why that would be a performance issue at all, it should be super fast for any messaging system to implement and quite an important feature to implement well.
http://www.eaipatterns.com/RequestReplyJmsExample.html has the tow main stream solutions using replyToQueue or correlationID.
It's an old one, but I've landed here searching for something else and actually do have some insights (hopefully will be helpful to someone).
We have implemented very similar use-case with Hazelcast being our chassis for
cluster's internode comminication. The essense is 2 datasets: 1 distributed map for responses, 1 'local' list of response awaiters (on each node in cluster).
each request (receiving it's own thread from Jetty) creates an entry in the map of local awaiters; the entry has obviously the correlation UID and an object that will serve as a semaphore
then the request is being dispatched to the remote (REST/JMS) and the original thread starts waiting on the semaphore; UID must be part of the request
remote returns the response and writes it into the responses map with the correlated UID
responses map is being listened; if the UID of the newly coming response is found in the map of the local awaiters, it's semaphore is being notified, original request's thread is being released, picking up the response from the responses map and returning it to the client
This is a general description, I can update an answer with a few optimizations we have, in case there will be any interest.
We have simple project which takes a number of messages from a number of endpoints(agents). These agents all output the same format message (an entity object to be placed in a database). All the agents write to the one queue and we consume these and send them to a database via JPA.
So essentially the system has a collection of producers writing messages to one queue. The queue is single threaded and just takes the messages as they come and dumps them into the database.
The issue here is this method is slow. Is there any functionality in Camel (like re-sequencing) that we could use to split out these messages based on their source. So while the messages from Agent1 need to be persisted in the order they are created, the messages from Agent2 are separate so they should not wait on the order of Agent1's messages. For two agents this is an easy problem as we just create two queues, one for each agent. We have a number of agents, so we need a solution that can scale.
Are there any patterns to accomplish this natively in camel? We could write our own holdout queue which syncs on the Agent name and only ever puts one message through to a multi-threaded JPA write queue, but this would be a bit of a round-about way to do things as we would need to either setup a callback from the queue to jpa camel route or we would not use camel and just do it via our own manager (not that this would be complex, but it would be great if we could do this all using Camel or something else out there and not have to reinvent the wheel so to speak).
If you source is a JMS message queue, then take a look at message groups
Apache ActiveMQ documents about this here:
http://activemq.apache.org/message-groups.html
And this FAQ
http://activemq.apache.org/how-do-i-preserve-order-of-messages.html
Basically you can use the JMSXGroupID JMS property to mark the agent id, eg agent1, agent2, etc.
Then you can have concurrent consumers on the JMS message queue, which can run in parallel based on the JMSXGroupID. But still preserve ordering within each group. That means you can in parallel write to JPA agent1, agent2, ... agentN.
There is one controlling entity and several 'worker' entities. The controlling entity requests certain data from the worker entities, which they will fetch and return in their own manner.
Since the controlling entity can agnostic about the worker entities (and the working entities can be added/removed at any point), putting a JMS provider in between them sounds like a good idea. That's the assumption at least.
Since it is an one-to-many relation (controller -> workers), a JMS Topic would be the right solution. But, since the controlling entity is depending on the return values of the workers, request/reply functionality would be nice as well (somewhere, I read about the TopicRequester but I cannot seem to find a working example). Request/reply is typical Queue functionality.
As an attempt to use topics in a request/reply sort-of-way, I created two JMS topis: request and response. The controller publishes to the request topic and is subscribed to the response topic. Every worker is subscribed to the request topic and publishes to the response topic. To match requests and responses the controller will subscribe for each request to the response topic with a filter (using a session id as the value). The messages workers publish to the response topic have the session id associated with them.
Now this does not feel like a solution (rather it uses JMS as a hammer and treats the problem (and some more) as a nail). Is JMS in this situation a solution at all? Or are there other solutions I'm overlooking?
Your approach sort of makes sense to me. I think a messaging system could work. I think using topics are wrong. Take a look at the wiki page for Enterprise Service Bus. It's a little more complicated than you need, but the basic idea for your use case, is that you have a worker that is capable of reading from one queue, doing some processing and adding the processed data back to another queue.
The problem with a topic is that all workers will get the message at the same time and they will all work on it independently. It sounds like you only want one worker at a time working on each request. I think you have it as a topic so different types of workers can also listen to the same queue and only respond to certain requests. For that, you are better off just creating a new queue for each type of work. You could potentially have them in pairs, so you have a work_a_request queue and work_a_response queue. Or if your controller is capable of figuring out the type of response from the data, they can all write to a single response queue.
If you haven't chosen an Message Queue vendor yet, I would recommend RabbitMQ as it's easy to set-up, easy to add new queues (especially dynamically) and has really good spring support (although most major messaging systems have spring support and you may not even be using spring).
I'm also not sure what you are accomplishing the filters. If you ensure the messages to the workers contain all the information needed to do the work and the response messages back contain all the information your controller needs to finish the processing, I don't think you need them.
I would simply use two JMS queues.
The first one is the one that all of the requests go on. The workers will listen to the queue, and process them in their own time, in their own way.
Once complete, they will put bundle the request with the response and put that on another queue for the final process to handle. This way there's no need for the the submitting process to retain the requests, they just follow along with the entire procedure. A final process will listen to the second queue, and handle the request/response pairs appropriately.
If there's no need for the message to be reliable, or if there's no need for the actual processes to span JVMs or machines, then this can all be done with a single process and standard java threading (such as BlockingQueues and ExecutorServices).
If there's a need to accumulate related responses, then you'll need to capture whatever grouping data is necessary and have the Queue 2 listening process accumulate results. Or you can persist the results in a database.
For example, if you know your working set has five elements, you can queue up the requests with that information (1 of 5, 2 of 5, etc.). As each one finishes, the final process can update the database, counting elements. When it sees all of the pieces have been completed (in any order), it marks the result as complete. Later you would have some audit process scan for incomplete jobs that have not finished within some time (perhaps one of the messages erred out), so you can handle them better. Or the original processors can write the request to a separate "this one went bad" queue for mitigation and resubmission.
If you use JMS with transaction, if one of the processors fails, the transaction will roll back and the message will be retained on the queue for processing by one of the surviving processors, so that's another advantage of JMS.
The trick with this kind of processing is to try and push the state with message, or externalize it and send references to the state, thus making each component effectively stateless. This aids scaling and reliability since any component can fail (besides catastrophic JMS failure, naturally), and just pick up where you left off when you get the problem resolved an get them restarted.
If you're in a request/response mode (such as a servlet needing to respond), you can use Servlet 3.0 Async servlets to easily put things on hold, or you can put a local object on a internal map, keyed with the something such as the Session ID, then you Object.wait() in that key. Then, your Queue 2 listener will get the response, finalize the processing, and then use the Session ID (sent with message and retained through out the pipeline) to look up
the object that you're waiting on, then it can simply Object.notify() it to tell the servlet to continue.
Yes, this sticks a thread in the servlet container while waiting, that's why the new async stuff is better, but you work with the hand you're dealt. You can also add a timeout to the Object.wait(), if it times out, the processing took to long so you can gracefully alert the client.
This basically frees you from filters and such, and reply queues, etc. It's pretty simple to set it all up.
Well actual answer should depend upon whether your worker entities are external parties, physical located outside network, time expected for worker entity to finish their work etc..but problem you are trying to solve is one-to-many communication...u added jms protocol in your system just because you want all entities to be able to talk in jms protocol or asynchronous is reason...former reason does not make sense...if it is latter reason, you can choose other communication protocol like one-way web service call.
You can use latest java concurrent APIs to create multi-threaded asynchronous one-way web service call to different worker entities...