Maintaining ordering in multithreaded apache camel application

Maintaining ordering in multithreaded apache camel application - java

We use Tibco EMS as our messaging system and have used apache camel to write our application. In our application, messages are written to a queue. A component, with concurrentConsumers set to 8, reads from the queue, processes the message, then writes to another queue. Another component, again with concurrentConsumers set to 8, then reads from this new queue and so on. Up until now, maintaining message order has not been important, but a new requirement means that it now is. Looking at the camel documentation, it is suggested that jmsxgroupid is used to maintain ordering. Unfortunately, this functionality is not available with Tibco EMS. Are there any other ways of maintaining ordering in camel in a multithreaded application? I have looked at sticky load balancing, but this seems to be applicable to end point load balancing only.
Thanks
Bruce

In the enterprise integration world, we generally use the Resequencer design pattern to solve such kind of problems that you need to ensure ordering in the messages.
Apache Camel covers a broad extent of the Enterprise Integration Patterns including Resequencer at its core and it has out-of-the-box imprementations for those patterns. So what you are looking for should be this:
http://camel.apache.org/resequencer.html
In your specific case, all you need to do would be add a custom message header like myMessageNo, which has a sequential number that specifies the ordering, to outgoing messages to TIBCO EMS. Then, at consumer side, use the resequencer EIP to restore the ordering of incoming messages from TIBCO EMS.
As you can see, however, it's not as easy as just putting the resequencer EIP to your Camel routes. (Any asynchronous solutions are always hard to build correctly.) For the resequencer, you need to consider when sad paths happen, e.g. when some messages get lost and never reach. To make sure your routes work fine even with those exceptional cases, you need to choose from two options: maximum batch size or timeout. Depending on the condition chosen, the resequenser will flush messages when the batch reaches the maximum size or it timeouts waiting for a missing message.

Related

Should I use one or two queues?

I'm doing a microservice that produces messages for an ActiveMQ broker.
My posible messages are;
1) Logs for my application.
2) The business messages I need.
Later I'll develop a microservice that consumes those messages, and I thought that it could be better to have two different queues at ActiveMQ.
My question is, should I use 2 queues, or should I use 1 queue with a flag to differenciate messages?

When we talk about microservices, it's about segregation of responsibilities and loosly coupled architecture which could be extensible lateron.
If you'll identify message based on flag
It will be harcoded even when messages are not related
Highly coupled architecture
Queue maintenance and scaling would be affected later on
and so on ..
I would recommend using different queues for different types of messages which serves unique purpose.

RabbitMQ implementation details

I recently watched a nice presentation about how RabbitMQ works and it kinda intrigued on how whole AMQP implentation works.
I was considering using it for a project but I would like some answers about the following questions:
1) Is it possible to have a broker and a producer of a message at the same place? I do understand that RabbitMQ allows the use of Virtual Hosts so something like this could be possible right?
2) Can RabbitMQ transmit it's messages over two diferrent subnets? I know it can trasmit over lan or wan, but how easy it to do this over two subnets? (One answer here would actually be to have them bridged).
3) Regarding question 1, how hard would it be to fail over the broker functionality to another place in case the original broker goes down?
4) I do understand that RabbitMQ actually provides different types of message transmissions. One of those is the fanout type which is more or less similar to a broadcast action. Would it be possible though to have something that is the inversed type of that. Meaning that you have multiple producers with multiple queues that all transmit to a single consumer?

1) It doesn't matter where the consumers/producers are, as long as they can reach (access IP:port) the broker. Virtual hosts have nothing to do with that.
2) More or less same as answer to first one, RabbitMQ us using the network and has no knowledge about what kind of networks is it in; also doesn't need to know or care.
3)Failover is easy, look for rabbitmq cluster and high availability. For the clients you'd have to take care on your own (so how to reconnect etc).
4) yes, broadcast is possible, you should have a look at tutorials and what kind of exchanges are there. EDIT As zapl pointed out in the comments, you can also do the inverse of broadcast.

Is Spring Integration suitable for web-farm processing of "reliable queue"?

Sorry if title is confusing, let me explain my question.
Our team need to develop web service which is suppose to run on several nodes (web farm - horizontal scaling). We know how to implement this "manually", but we're pretty excited about Spring Integration which is new to us - so we really trying to understand whether this is good fit for our scenario - and if so we'll try to make use of it.
Typical scenario:
Sevaral servers ("nodes") running same web application (lets call it "OurWebService")
We need to pull files from external systems ("InboundExtSystems")
Process this data with help of other external systems (involves local resource-consuming operations) ("UtilityExtServices")
Submit processing results to another set of external systems ("OutboundExtSystems")
Non-functional requirements:
Due to performance reasons we cannot query UtilityExtServices by demand -AND- local processing also CPU-intensive. So we need to have queue, in order to control pace at which we performing requests and process results
We expect several nodes will equally pull tasks from this queue and process them
We need to make sure that every queued task pulled from InboundExtSystems will be handled - we need to guarantee that none of them will disappear.
We need to make sure timeouts are handled as well. If task processing timed out - we need to "requeue" this task (and make sure previous handled will not submit results for this task)
We need to be able to perform rolling updates. Like let's say 5 nodes are processing queue. We want to be able to sequentially stop-upgrade-start each node without noticeably impacting system performance.
So question is: is spring integration perfect fit for such case?
If answer is "Yes", could you kindly name primary components we should use primarily?
p.s. Sure enough we would probably also need to pick something as a message bus and queue acessible by every node (maybe redis, hazelcast or maybe rabbitmq, not sure what is more appropriate)

Yes, it's a good fit. I would suggest rabbitmq for the transport/queuing and the Spring Integration AMQP enpoints.
Rolling updates shouldn't be an issue unless you change the format of the messages sent between nodes). But even then you could handle it relatively easily by moving to a new set of queues.

Multi Threading vs JMS Queue for Asynchronous Logging

Requirement: Log events like Page Views and form Submits. Each page has ~1 second SLA. The application can have 100's of concurrent users at a time.
Log events are stored into the Database.
Solution: My initial thought was to use an async logging approach where the control returns back to the application and the logging happens in a different thread (via Spring's Thread pool task executor).
However someone suggested using JMS would be a more robust approach. Is the added work(Setting-up queue(s), writing to the queue(s), reading from the queue(s)) required when using this approach worthwhile?
What are some of the best practices / things to look out for (in a production environment) when implementing something like this?

Both approaches are valid, but one is vulnerable if you app unexpectedly stops. In your first scenario, events yet to be written to the database will be lost. Using a persistent JMS queue will mean that those events will be read from the queue and persisted to the database upon restart.
Of course, if your DB writes are so much slower than placing a message of similar size on to a JMS queue, you may be solving the wrong problem?

Using JMS for logging is a complete mismatch. JMS is a Java Abstraction for a Middleware Tool like MQ Series. That is complete overkill, and will let you go through a setup and configuration hell. JMS also lets you place messages in a transactional context, so you already get quickly the idea that JMS might be not much better than Database writes as #rjsang suggested.
This is not that JMS is not a nice technolgy. It is a good technology where it is applied properly.
For Assynchronous logging, you better just depend on a Logging API that directly supports it like Log4j2. In your case, you might be looking to configure a AsyncAppender with a JDBCAppender. Log4j2 has many more appenders as additional options, including one for JMS. However, by at least using a Logging abstraction, you make that all configurable and will make it possible to change your mind at a later time.
In the future we might have something similar to Asynchronous CDI Events, which should work similar to JMS, but would be much more lightweight. Maybe you can get something similar to work by combining CDI Events with EJB Asynchronous Methods. As long as you don't use EJB's with a remote interface, it should also be pretty lightweight.

You could give it a try using fully async and external tooling if you want to. If you have to stick to your SLA at any price and resilience is important for you, you could try using either logstash or process your logs offline. With doing so, you decouple your application from the database and you are no longer depending on the database performance. If the database is slow and you're using async loggers, queues might run full.
With logstash using GELF the whole log processing is handled within a different (or even remote) JVM. Offline processing (e.g. you write CSV logs) allows you to load the log data afterwards into the database.

Java EE design pattern for MDB

We are designing an application in which MDBs will pick up incoming messages and do a series of tasks.
Some of these are functional like XML validation and some are like aspects such as logging, MIS entries etc.
Edit:
Message Types can be different based on the functionality such as Ordering or Raising Faults or Information Services like Postcode Lookups. These also vary on the Caller so an Order for Caller A is different from B, but mostly the XML structure should be the same.
Each will go through functional units of work such as Validation of Sender, Validation of product codes (if placing an order), Security checks (IP based), Registration into our DB if valid, Error Queue if not and so on.
My question is since we want to make the functional bits modular such that we can build one MDB which does functions A,B,C and another MDB to do B,C,D and so on based on what type of message it is - and based on which tasks are common across all the message types.
What design pattern should I be using for this?
Secondly, is there a way for me to configure these functions in an XML file, so that the MDB reads the XML to see which functions it has to execute and in what sequence? This is as an alternative to having the modules in Helper POJOs or Session Beans which are linked from the main MDB which is what we currently thought of.

#shinynewbike: for your problem, it would be better to use MDB to just read the message and determine the type of message and then MDB can consult a factory class to return a list of handlers implementing same interface and which MDB can iterate over to call...so basically a command design pattern. A sample XML configuration:-
<configuration>
<handler name="A" class="A"/>
<handler name="B" class="B"/>
...
<handlers-stack name="stack1">
<handler ref="A">
<handler ref="c">
</handlers>
<message type="X" handlers-ref="stack1"/>
<message type="Y" handlers-ref="stack2"/>
</configuration>

Strategy, probably, with the quirk that each MDB can have several strategies. If you want to configure the set of strategies that a bean uses in a file (or an env-entry or similar), then you'll have to obtain references to the strategies via JNDI, rather than having them injected, which is a minor pain.
In a non-EJB world, i would suggest Observer, but with EJBs, i think it's rather hard to have one component give another a long-lived reference to itself. Unless they're #Singleton, which MDBs aren't.

Pattern terms aside, we have strived at our company to keep business logic out of the MDB class itself. This works really well for what you are trying to build here, which almost sounds more like an Enterprise Service Bus (ESB) Service Gateway pattern. Check out the following links from MSDN (good page even though it isnt Java) and Martin Fowler.
I would recommend allowing the MDB to take in the messages. Then you could use other patterns (Command, Strategy, Factory, etc) to do the actual work. Or, the main MDB could figure out where the message should be forwarded to and then forward the message to a queue dedicated to a particular type of function.
This does add some administrative and resource overhead from the perspective of more queues and MDB's. But it also adds a bit more separation between the different logic for the differing messages (ie, separation of concerns). And it also gives you the ability to throttle the differing "implementation" queues differently depending on performance needs, rather than having one queue be the bottleneck for all.
There are performance considerations to adding new queues. I wish I could give you a concrete answer as to "how much", but I can't really as it depends on what you choose to use for your application server and your JMS and/or messaging provider. And unfortunately, there is no magic "number" for what is right. You really have to sit down and discuss with other architects how many queues you need. It is best to do this upfront with your design. This will hammer out any number of queues. Next try to figure out the load on the system. How big will your messages be? 100KB? 1MB? 5MB? larger? smaller? And then how many messages will be coming through the system at a time? With numbers like these you can revisit your decision on the number of queues and see if it still makes sense. You can also have your application server/messaging admins (or you if that happens to be you) throttle the queues with the different configuration settings so as to allow for smoother messaging through your system. (You may also need to tune the application/messaging servers JVM heap settings too depending on what you encounter).
Sadly, the best way to gain performance by throttling your application server is by reading, reading, reading whatever documentation and forums say about it. And also by experience of working it yourself.
But even with that. The most important thing is good and yet simplistic design. If you go overboard and make a queue for everything you may impact performance. But then again you may not. But you might over-complicate your application and make it harder to troubleshoot.
I'll try to find some more links for you, but honestly take what we all say here and discuss it with your fellow developers.
And if you could alter your question to mention how many message types you might deal with or their purpose, we all could give you a better recommendation along the design, number of queues, etc.

While I realize that this does not squarely answer your question, you might want to have a look at the apache commons chain

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.